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THE GENERAL CANONICAL CORRELATION DISTRIBUTION 

llv M. S. JUiuj U'i 

f 'nin rnilij af ('nmhfithjt, Engltni‘1 and f hi < > -;/?/ n/ X>irth iXimUna 

1- Summary. The gcm-rnl nuumind cnrirlatmn deunhtifmis m gm n a 
multiple power wrin* in the true enn<>mral mriebdion* p. When only uj«* 
trtu* rnrtnlttlioii in nut win, thin serin* if exprerMble ,in a generalize*! Jnyjwr- 
geometrie function, for tin* iws IkUIi of non-rout rid mean*. am! uf cortehdjuim 
proper. In flit* general «>hh' uf more than hih- mm true correlation Jim 
corllirirnt.s ill flic- expansion depend «n flm ennditmiial ii*omoi<t.> of the MHUple 
mim'latimw lietweeu the pairs nf tram-formed variahle** icprerenting the tun* 
canonical vuriublca, when I hi* wimple canonical cot i elation*, Itetwccu the rumple 
mmmmul variables an* lived. Methods art* given ul obtaining thc-e citefh* ir-nb* 
for Ih if 1 1 niM*», uumeeutrul mean* uml currelationr* proper; and their form uji to 
tUr* fourth older, eorn^jwuiding <<• Dtp 1 ) in tin* «xpnm4on, listed in Apj«ndt\ I, 
Tin* detailed terms making up these curllirirnt* im* given, in tin* raw nf tun 
noiezeio rniielution!*, nji to tin* fourth order, urn) in tin* general ease, up t*» the 
fluid order, m Appendix 11, 

2. Introductory remarks; the case of zero roots. In tin- etiitistiral theory uf 
the telidion nf one vector variate with another (wt Hotelling [Ijt, the stmul* 
tancons distribution uf the cummind correlations r,, which are the roots nf a 
certain determinantnl et|uutiun, was find, obtained in WM\ (Fisher |2J, lints j.'lj, 
Hoy |lj) in the special 1ml irnjHiiitinf. ease when the true roots or correlations f», 
are zero, Hoy |.*ij him mi ire investigated the ciim* where the true runts are nut 
zero when these non-zero values arise from nun-central means, The present 
investigation is primarily intended pi rover the alternative raw* where mm mu 
runts arise from the existence of trui* correlations p, , The method developed is, 
however, also appltruble tu (he ease uf nun-eentral means; and it is shown that 
the general distribution, which fur more than one non-zero root Iteeomes very 
complicated, does nut in the run* uf mm-milral mean* agree with the distribu¬ 
tion given by Huy |f)J except in the caw uf only one non-zero runt. 1 

It will be eimvement in this Introductory section to sketch twills alight modi* 
tu*atioim» the method used bv 1 Hu {dj to obtain the solution in the ease of zero 
roots, a. fwiint! nf lib intennediate formulae are um*1u 1 tor the present develop¬ 
ments, We consider » dependent vector variate with /i componcnta, and tat 
independent’ vector variate with q cumjamentu. For definiteness we ami me 

1 Tins coitcHwtim tin* h!h* Ik-hi mvhed tiy T. \V AinU-rom, who law given it Zeluletn 
el the non went nil te,enlist |ii«tilt*m lit the sum*)* of eidu*r one «« iw«* non zero roots, < Aimoin 
4 Math mi, Vtil. i? tut-ttti, pi. urn t:tii. 

5 Tl.o- eluKKilicaliou of a virnaie at. tin- "<L*|wii*l»*r»t vauao**’ or ■■iioli’jM*iuteiU variate*’ 
in in the regreMtimi w*n*e, nml not ueeewnnly imply *siaiiMieat (|p|tt»inleiice or little. 
pendiuee. 


1 



2 


M. S. BABTI.MT 


p < q t and the sample with r 'V drftrc*** <»f iu^bm ' m n" , )r n-bng i" f)»<* 

dependent variate is divided in the usual v, uy iw, for rxempK >«, it it/, 3 j«v* 
with q degrees of freedom corresponding to the indej-'Tvb »,* t. ,«mtr .md *1," 
remaining part with n — q degrees of freedom. If u,. , h, - 'h'Se-’e *hf s : w>ri »•{ 
squares and products corresponding to this dividoit, then jj j, kno'^n ’Vd the 
joint distribution of a,* and fi„, if the dependeni vot^r 9 ** 1 *. le-nn >J »n i 
actually, in the statistical sense, indc|iend«"nt of tin* »*»»•«<! i<*i is 


(1) 


| d j ! /j ! * < ’ 1 * fl *’ 


exp 


1 «£, 

.,53 On t‘ h.. ( 

i t~J 


4«i rfh 


2i»P T lrtj>“U 


ft lUifo “ UM'ti'u 


H 


> J-l 


where | A | denotes the determinant of the matrix d - f > hi,,'.and ft; f l.< i<r- dint 
of differentials dan 1 and where for convenience the vansm* uw'te, > i «hw 
dependent variate is taken to he the unit matrix. 

Wc make the transformation specified hy 


A « 

( 2 ) 

a + /i » inr, 

where D is a diagonal matrix of the quantities <\ in demuidiug«<i d<*t * f mnumtnd", 
and Iff = [te,/] is a matrix (with transpose lr'i uniquely dem-rmsm-d hy i',h 
except for an ambiguity of sign for each column; this ambiguity r.m rJmitwsiilM 
by choosing positive elements in the first row. The Jacobian A «d i},<> tran** 
formation may be shown to be 

(3) A - 2 P | ini" j l,,v, ft ft (r) - r*b 


By direct substitution, we obtain from (I) the distribution 


p(ap,M = p(v) U ,r]) » p(ic<i)jO(r*l, 

where p{x) is a general notation 3 for a distribution function in one nr m«m 
variates x, (including the differential elements); for p(te u ) and pfr*i w,* have 


(4) 

(5) 


p(w«) - C\ | WW i exp 


1 *9 

o E, w, 


in 


r/«% 


P(r *) 


= ftft (1 - '» ft 

l i 


(r* — rj)}> 4>’ 


The probability symbol is not ot course to be confused will, the mnnl« r p „f «v4»tw»ncm« 
m the dependent variate. It should also be noted thai Tor eunvemoiKw /*</,'. «**4 n, 

denoto the imnt probability for a set of quantities x, , whereas pU,i or plr,i dw.nw. ifce 
protiabiliLy for the specified varmte x, or x, considered separately. 
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the,? constants (\ and (\ lieing arrang’d to grie unify tin integration <tf pur,, 
nr p(r)\, i.c. we. have 

p l 

fl Uliip - 


(CO r. - 

(the ir,, varying from 


2^ ^ ^ 




ifl.T'ljtn — il|J, 


(7) a, « V 


,p fl HIK* 


to -/ except that tc !( » tlQatul 

- *»/n'lJf/> — riU’Hh/ “• /)]r||(n • </ - #*})'. 


3. Formal determination of the general distribution. The method to he 
adopted of obtaining the general distribution from the particular case quoted in 
eciuation (5) above is the same in principle as the one adopted by Fisher |7| in 
hia derivation of the general distribution of the multiple correlation coefficient. 
Since the argument in more involved in the present problem, it will lie presenn-d 
firat in formal prof lability terms, before the details of the solution are examined. 

We consider a tranaformation of the components: of each vector variate to the 
true canonical components. Tad the observed ordinary correlation coefficient A 
of these mutually independent components for one vector variate with the 
corresponding components of the second vector variate be denoted by *, , The 
true, correlations are the true eanonieai correlations p,. Then we have for the 
general canonical correlation distribution denoted by' p(i\ p,f, the expression 


p(r\ p,i / p(r t , s, | p,) 

»• / 7»<A i a<, p,)/>(», j/».) 

*'•( 

• J ?Kr, ; M.lpO) j Pi»/)(«! , Pi) 


pi tip J p,d, 


Die substitution /)(/.■ ; «,) for j>(r, j a, , p ,t follow ing from the sutliciciiey of the 
independent correlations a, of the corresponding pairs of canonical components, 
as statistics for the p, . We now detiue the function (pxi , pd by the relation 

pi *i : pi) p(s,, pi - m iji.t j, pjf, 

whence we have the general solution 

l>(r, i P.) - / p(r, J ftlp(» i j pi ~ UU/lAi , pri/Uft : ft ‘ 0>ff \*t .ft 1 " • 

w / p(r, , « f ] p< • J , PP'/'ft , p,' ■ ■ ■ 

« p(n ] p, *, 0) J p(s, j r, , P, 


m 


' rilfftsi , Pi>Qhb . fill 
for p(r, j pd in terms of the special ease pir, p, 0), 


4 Quantities In I In* rishr of tin* vertiml stroke in *i prnlmHIitv tm«>-hci are «;«*.« ipisrito 
lies on which tin* prnlialolily distriluUiun depends 
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Now according to the independent veetor venial** i* **» *:*&* r* 'i ,t - i ( n-.-m p 

variate with which the dependent variate is r*oiri:P«4, i» i f»v’ ^"«■ r m 

sample space (this includes the non-mitral mean 1 - 1 1 i4« r \7 \ hv*. * n i h *? 
the distribution of the multiple correlation R of ,t *n»id« - 4 n-s.ti* aPIi 

independent variate comprising m romponmtf* i** jrll 0 M w f’ 

(a) g(It, p ) “ V{\ n, j n; \ m ; pR‘« < 1 p 1 ’", 

(b) g(R, p).= /■'(! n; I m; I 4h* ? n ,1 \ 

where we replace, p 3 by a parameter d 3 in caste fi**, and the n<*fnbo!i for Inpnr, 
geometric functions used is; 

ri/ . . , , ofJt a{w 1 1 )x 

l + y + ^4 *• • 


an 


( 9 ) 


«.„«;#;*) = I + I- 

It follows that wc may write */{si, pi) ulmvc in the form 


I* 1 


( 10 ) 


(a) 


p(si, pi) * F(\ n, | n; J; p?*if (I 


**5 


(b) ff(si,pd « F(}n; !;^;V '•> 

by putting m = l in (0), (the signs of the an* arbitrary, *0 H?la* A* ai< < t,n n 
tially concerned, as in hive multiple nurelation dintrsi»mion, „jih rj,«< p.pi.ona 
of the correlations). From these series expansion* the ihVgt :*3 -,n in tTMjgW^n V *kf 
terms corresponding to the comlUiotml moments, for suiv «*s «>f -.j.vjv-f * 

0 i ti , 1 ’ ■ , tp , 

mOi £{(«i)" 04 t ‘ 5 • • • («>»**’ ; r,l 


f («v) ,!l (»?) ts ' *' ** r P ) f *' pfzr* r, ,p v 1 *t* 


In the particular case when only pi -4 0, the moments pdfi * f„'j'»} 1 r,i fr«»m 
the single factor £f(si, p,) are all that arise, but m the gcm-ral oa**. ,* „ sm^, r r; jn t 
to notice that the quantities s’,, while statisl ieidly independent n h«n uworfrirtwi 
are no longer independent for the conditional disIrUnUmu pK , r,, * 
llhs completes the formal solution. It remains to evaluate j»tj 4 , r ( , ; s 

4. The conditional moment p(t, t p >. First of at! mo,- u*. 

cho, ce of the components of tlm dependent vector variate, applying the ,ma3Ym 

of section 2 to such components, that the multiple correlation //. 

rth component and the q components of the independent variate »* g ,vm Uv 


where 


- a u/(an + bu) == «* ( r! + «Va + 


4’ «V>*, 


“(/ — tu<J /’V / (uifi + 10 % + ■ • • + t4„), 



« noMi vi, i <muu.uto\ 


T»fuWf» <!,*' ♦M**ilini).i|| .-j She a, fmm (lint of till* ir 1() v U * note that till* ?/•„ 
ill .'r-J-'jtvH ‘1 !•- sc’iiiM? sillnwmg f.»r I'imvciucniH* «•,, to vary from -- / In f ) 
rv*}'» f*u H;> "btihage fa<*t«ij-*’ 

‘I 1 "' '«“«’*'• r IIII t> rii«« - uii. 

A."-*) 

It* n* * if nr tt„sr**i(nrm in Hi** vamlih** r„ , fl 1? (!o(Int*<l Iiv 

*■.. f «V, 

«il < **'• W,« , 


« ? 3 


**« Mil fl,| t «W H,t , 

*»,* Mil tf.i Mil <t,j CHS (1,3 , 


fin I 4 ,i mii H t) hill 0,j • • nil Cl,.,, i , 

Mm* ** t * <■„ , for normal ir,,\v.uilil all he iu<lepeii<leuf with <li**trilmlionw: 


ji * ill»llll<llt|n|| ttitll p ill-glee- nf flt*l’*|niu, 

I l',l pi ( J , „ 1 Mil'*' ’ X , «/**,, , 

'll - **. fc *» !*»J J 1 . 2, * * /> 2, (1 ■ ; ; • 2 fft, 

»i u* i** i.sl rH,nn Jh*-u <»«!> jH iiiim*r lot given r, hut tin* linkage factor results 
in an **l**vrttt*ui «<f th*“ ili!*trihuli*in> to u degrees of timloni. hihI a linkage 

fa* 1**1 h-r *he <*,,*iliNliiliuti**iii* **l 


’ 1.1 

m here 


A 


W nis/i- ;))n4«r 
f-vrW«* 


A ■*" a**<s**jt 1 o,;*i s'! I ' i . 


\V* may now, having ul»ts«me*l the tlmtrilmtion of tin- a ,,, note their geometri¬ 
cal interpretation, let uiv denote the p comiHineut!*. of the dependent variate 
HI ji dimemiiomt! sample space by tin* p vcrtm* h < bo "•it** I- 4 ’ 1 'lit* p 

orthogonal eanomroU ron»i*nnent» rorresponding to I hi* min pit riumnietil rorrela,- 
thms r, In* denoted by tin- p unit vectors Xi , x s , •••,*>.■ let tin* corresponding 
eomjKments for the ituiejtendent variate 1 h* n<, y. ■ 't’hc* “linkage factor" 
merely reprewnts* the allowance that must 1 m* made in the mutual relations of the 
^-vectors for the fact that while they must lie in the p-space of the x veetors, 



c 


m. a. nuni i ri 


they really belong to the original mt-piee 11* son '.•Pr.l’.h i' 1 -• s . ■ 

coefficients in the equation 

(14) ?■ *“ W\|JC| -f !C,;Xj ► - t , 
where 

r, as «'li + io!i t- • • ' 4- 

is a x J with si, and not p, degrees of freedom. If «•* tmw j-.jjs}»<tMr f*w u-r,% nor tee 
£, to be a unit vector, we have in place of (Mi 

(15) 1;, => a,|Xi + «fiXs 4- ■' t «.*x*. 

■with a projection, on the g-spaee of the y-vcru.r«, of (<, mv» mIkk* 


<i = omnj/i + «, t rs5fi 4- * ■ • , 


and lienee, as already noted in the algebraic derivation, 

ft 5 . = (J;. • WVcl “ 4 4- 


where (? * {) denotes a scalar product, The linkage f »*!»•» ' n «..«!«■ at* * *4 ,u, 
the £, vectors in (15) are not independent in the p-HpjKf ,.f *),». * <f»c 

distribution of their mutual configuration Iwing deU"-mir«» i d !»i h *p,v*- 
This interpretation enables us to determine the rnomm** of *W iWcuboti- n 
p(s< (r,). For if corresponding to (15) vu* write 


(16) m « ri.isti + 4 • ■ • 4 a H y,, 

then 


(17) 


Si » aadui'i ■(- <xn&vtTi + 


t , F , 


If we ai’e considering case (a), the relations, of the «, t« the s(j ,j. t „ ,♦ 

ivill be similar to the relations of the 5. to the x-vcrt'.u* in p> span* Su raw < ti t 
however, the n ( , which represent the true eanotneal i •otrij#»tiej»tr of «* ,.j «» 

fixed vectors, must remain strictly orthogonal Ut each oilwt although th*or 
relation to the {/-vectors can vary, This means that dm re)stn<t*s «4 tiU «. 
to the y-vectors are determined by a random rotation of s tw .| 
oiq vectors m case (b). We may note that if in ease f»t w*. »3Snw«» n m \ n A u 
infinity, the n, would also become rigidly orthogonal, m that M*w w< * i m- 

(b) may conveniently be obtained from n» (a) by the 

toon ol the a,, and fov the fi ( letting n-+ %. 

Thus in either case the moments of the «, can I*. obtained from tfjj, in 

“?“ ien ! s of , a,J * ad • two independent wt« of eoHfon, nt* for whir Si th* 

cl^ ^ ^ f 1S . kn ° Wn - T 1 ' g lll r ve «•»««* L 

comp ete the required solution for («?)“(*?)'* • . • (#" fa « ft,,*** *»f t „Li 
Sin 5 and " Responding linkage factor ean be m temir *<f 

«* iCr z'z ? z ^ - j; T- r " • 

c uk Vf, atid <?,,. J lus method m Mafurtimateiy 
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l**o < utnl ej-oin,- algebraically f*t U* of any praetienl value except in the east* of 
lire* inoi, Thi*« ca*c i- considered separately before flit* rtl ease is 

ih’n ’swd 1 nr Hu r 


5, The case of only one non-zero root. Here we only inquire ptf) and a 
r'*ui|M5.PhU'Jy simple solution »*■ is>-,«ihle, the linkages within tin* and n, ,-ots 
i > nu; tmlevatll We have m fad, if <f~ 1* the angle between m and (, , (where 
{■ w;«t the pr«t|ee}5<»n of f >: in the 1 / tpaee-, that ^ i-t it random angle in the r/ .-paee, 
“tiicc !le« and »f,, sett* ate indejiendent Henee in thin jmrhmlnr com we may 
convent*idly wsife <4 /,‘J cm*. 'yt, which i** just the f miisforniution used to ohtain 

the dti4nbuli«*ii >»f the multipl* e*«s jelaiion U\ . Thus we may replace 110 l»y 
"P , wliere/fj «j|t; ? olpj * - * i r<|;T P . arid 


■it 


j 1 



rh\ 

— r 11 11 * - * • ■ 

- cos' 1 "' fl,> sin ' fl : , eo/ 111 /?,* sin' 1 * i5 flu ■ * • , 


wh‘-«e tie* e\jicel**»l value of the trigonoructiic term in evaluated ns 
:Vin, 4 JipM; 4- | fr * * 1 I*. J//i 


HHi 


V r»lii*#5» --- 

WV have ri**w obtained the distribution, fp? *•• p,, lit, 

p(r, pi x Of pir.fi, Ut * drj 


where /hr, ; p, 
f ’ 1 «,. »j, 

sml m ru*e 'b * 

Cittf, lit, * 


w here »/, 1 * »fj * 
at! mV from ft to • , 
geometric function. 


(II is given by (At; and ill ease fit t 

Y(\n 1- f 


1 ! - 


l‘J l 
I’t 

i’i l/i t f *n |f/ 4 /* 


vin, -i j r 
mind 


a. n|« •t'/inimnsv) A 'r<«» » if* 

f n'Hinj/i *1- mijv 4- u L nim,: . 

j Up i*. denoted l>y I. • ■ denotes summation of 

The solution ill either ease contains a generalized hyper- 
If we denote the general series 

I’frf, -f* >i, >r,' 


Jrta, f finu, t it r<r,trtr.» A 
V ' rioiirftti) r<r, t tirtr. i nil 

ys ,T*i« 1 n m)rfri> A 

\ V(ni nr, 4-nr(r s 4* n ri 


rtd.iud 
rfrt, + utUi 1 
W,W- . 


{ 


i>y 


Fftt,, oj; 0i, fit 
f(a; A , A , 


, ftp i r,, r } j a Ha, 


»ftp ; n , r 3 ; JT,, ,r s 


t*|»' 



8 


n •- 


1 : ; 


respectively (see (8, pUMh‘’Vmty>b 22 , i' <». " t ’■ 

n(r, i p\ f fb n'r, e. 11 '3 #>. "' 

(19) 

Jn: !, J, • . !b- lv *•("( 

and in case (b) 


( 20 ) 


p(r, i 0i 1* 0) - pi r, ; p, ■ • t!» 

Xf(i n; i, K •• , l; 5p< I?. 


b; 


t 


An alternative operational form ist nbtmrw*! ibvi iv ■ J 

given t - Ui + Hi 4- ■ • ■ f «„ is g<-u< t;su-i I,;> u > ,«■■» ..f i? <■ . t r : 

rr-i -*%:? • 

where for dcfitiitctipftK wo wjiihpIw <-w f«.i Il< i >w ■>« . * 

^(“ii «s| b, o; j) £, ! *««. J-M;>n * >' 

i 'ill i 1 rt» ' 1 ' , l f 1 r . > 

we have 

/' (4»j \n\ i» ai • •» , I 9 ; pjrj , » r 

( 21 ) 

• |p„ ; If 

where 6 denotes the operation of taking <!««* lrrtn <4 ; . <,<„» n-,, »•'«,« 

possibly be done by multiplication by s ’ mi evdoste » <4 a mitji v „ V !« m 
integral but in the use of this formula hn< ib»- «.j«-r;tiv.r, m jm* Urn ,■«. , i 3 

directly), 

It is of some interest to examine a simple «-«**.. ,«,, , p,, b -? ,« 

/ /nn'1 pd <• 1 . 

If we take p - 2, q • 3, we obtain for pirj, r| A 

Kn - 2)(n - 3)(n - 4)(l - W * (l .. r ;,*■ *,4 t} . 

S^t^n ft . aa, ^ hpi 2 *’ ’ 

t y 01180 n = °* we olrtftm «m integration of r| from ft to -» 


p(n 1 pi) = 8(t?)* dr?(l - ,;if V s> rTO + <»] 

^ " * rrai 


U , 


l TO) J 

. , r W i‘i + owf '5 1 ' 

^l+fj n^t\ J lOj^'iSiy 4 ' 

»«<-«, + %. No, trom Ike identity {I . X| M 3 .,l . , , 
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’ *'“■ !t ' " -in the expansion of the left-hand side. This 

"o* jd<nhfy, i«<T 01 f • 11. 

V. r 'i * _ rf| 4/+i) r(i + / 4- 2 ) 

t -i: ' mm -p m “ n^a + 2,11 

, i<l 4- 3)I’ff + i) 

Mi»r(z + a> * 


y * ) * vrj 1 ( Uf) f -Ml ivJ i <1 
i ivji/tt, i 2H ■" 3 iv*ms-f.V 


IT; 


?* j , *» 1 f- 1 »/’5 1 


in 


; -c~* i <4 
’ £-*' 1 tv 


m - 1 - tj u -1 :i) ? . f 
fr>i > i> 


jr.'T 


i’ « 


'*' r id V, ra ; + :i ) 

^ ndjz! 

< *<» d >?«j d 'dr?54s'? ; , «l •- ea ri t 

v 5 "'h-iujo. 1 " r ur(v >-ju>s l >*jtrati.>n of ri from U to 1, In purely nlKftUrnie 

1 ■ .“On 


v*„ •! * ei i 5 , rj'' </r, )1 

I!** rusdwj/ , ”i, .1'isj; r < d mmunin** *21 \ we have for the, same oaw /; — 2, 

*, .5, n »?*.- •< .iiMii 

>2:. «, <; 'ff*,-Mi », /i M Ar *n 

-utli (* ’S^'i "n «. u t«« ri , we nlifum 

r. t ;2lj lid - PI AJ| f ’ 1]\ 

r 


I'Jlh t'tfr'ys *1, i ,.* ' i - p\ri?i *<"«’ J* 

O’) • 


u . ») ) 

Him »}.*• tin* irrational expumnm »1 • - i>\r\zy esuras, 

oi,<S le*in*«.»«.,»%»'* «n ti-j'su - }«*ij'hnt «f z, \vi* ulitnm Hh* dihtrihutton ptrj j pjl 
iitvMr m ‘Si 1 <»r ;23 in ««f sin* appropriate lermt. Wi* may further 

integrate dim sly tie- * \pr< W.M 11 ,sh»*vr with rr-i'rrt So ri , atul after dimwling 

nyaeil* ur* Vtk aft* rutr. we i<S»i!nn 

*2* l>fS m f *>/' S»i, »1, I; » M.■ , 3 fl 1 pi»n. 

I i 

»■ rtowlify imwiaittw! i*< ’■•* unity. 

Ct More than tme non*mo root. In tin* general raw* tin* farter multiplying 
p{t f | fi, ■- 0} ir rathrj muarkaltle sn Mug aynunetriral in Hoth the set r, and 
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the set p, . As n increases, the convergence >4 r, !«■> c- , *“>■ '0,<"i ’il.o 

pi are also arranger! in detrending imbr ••{ isygmSod*' ? I' 1 '**; 

restriction n > r 2 > *■■ > tv . The limiting dMrd mV i '!*■,}, < 5 , --n,-,<*} 
by Hsu [9], 

In view of the algebraic diflindtv «f obtaining 1 «4. , *1 , ,~j '' > <i ;*• t 

integration, an unsynunotrie method ><f «'btnimifK ri « W'nn </ * <, b.<‘ *-..j 

Tliis is fairly tractable, in the east* **f tn«« non ?*;<i i.>- '!;,<• - j.d v* 

of the original variables is transformed by mi orth-i# in] “4 ,u ■ ;-}< 
that the first new variable of the second (*•» n» d« i* -uom-i !n '!•<- i,< n 

between xo\j and wi ,. We may write, for evnnjde, 

ton — (wnteji + U‘ijM'2j HP ***1 («j| i wb 1 , \ 


( 28 ) 


1023 — 


1023 


-Wjiftois + -f ?ei r "tr*?i 




i 3„ r 'i* « 


bob + • - • +• W|„Vuv *■ 
«n 

~ir )2 (tob -1-4 wipiv'tt 


»L 


* W 


f(u'u + • • ■ 4 


f 


*f|t 


which conversely we can at once exjirew a*n rrinbori **f ri,, 1 *},e 

Wij, (since the reciprocal of an orthogonal matrix i* •’imjdv »»-i *t jj 

we write 


«« “ «%i/l(«4i) 3 + (iCjj.l 1 4 » Hi},>*?, 

“21 = t0n/[(tfti) s -f {trjjd 4 4 ! j\ 


and write further 


a > ~ 003 ®n) <h — cos 8n , • • • t bi cos #5 S , 
ha = coa fijj , ■ ■ • , where an =» cos d' n , «4 2 ... ^iri #1, «*<»., ^ , 
we have in particular 

a 2 i = a x bi - b 2 V(I - a?) V( ! - /,?), 

(30) a M == aahi V(T^~af) + 0 , 0 $, \/(\ - l/j) 


1/1 V(1 - «l» V41 » y tl 


where the distribution of the a’s and b'n is proportional to 

{(1 - a?) ! <-«> da,} {(1 - da,} • ■. j(i - }<I _ r 


*‘diy 
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For the reasons discussed in sent ion -1, it will He notiral that only the distribu¬ 
tion of hi in the a, h set is affected Hy the linkage factor. By such methods the. 
expressions 

m(I, 1) =• K\al*l \ r t ] , p(2, 1) n E\»U\ ] r,| 

were fairly readily obtained, If we. introduce flu* notation 

.% m £ Cr?)\ S» « £ (rW,)\ etc., 


and also symbol* for the products of the u and 8 moments, viz. 



a E \oiiciisj li jrfhdhK 



K\n ii«5;l /vjdiidjd, 


etc., we may list the moments g(t, , t P ) as in Appendix I, which gives all 

moments up to thefoiuth order in terms of the a and 8 moments tthe numerical 
eoelhcienb nine from tlio numbers of ways of forming the two-way partition*,). 
"Half facton." eone-ponding to the o moment* are li*H*d in Appendix II against 
their uppiojuiale symbol, the em responding factor* coming from the 8 moment* 
being obtained in ea-e ta< by writing q for ]i and in ease (bt by writing also® 
n • * , Thu . in e.i-e'an 


a' 1, It 

02 

and lit caw* d* 


n J 2 n ■) 2 . 

u]H)i t 2«, 4' 2)_ 

up }■ n -- 2 nq b n 2 

Jip-p j. 2 ) 1/1 • ■ 1 .nqitj + 2)t q 1). 

, [’ ““ (» ™ I>) 1 [" “ (» " <l) 11 2,S’ 

** L«/»t/t 4- 2Kp - 1)J L nqUj 4- 2i(r/ -- 1 )Jj " 11 ’ 


a'b 1 [ " * 2 . 1 1[ 1 „ lv. 

| up, p i 2<J l *M b 2 j_ 

02 ) 

, Jn/t'l' » "2)l q + l ) + 2(u - p \\ - s , 
\npip 4- 211 p ~ liq(q + 2 )(q - 1)J ’ 11 ' 

Hy menu* oj tie* tiatiMonontion Hist it. i« possible to develop the moment* 
jjtt'n , ,? S ! m tli« f case of two non mo root*, though in obtaining the result* quoted 
in Ap|**iidix It, nln-fe the toimuln for abb It and p(2, 2» are included, it \va» 
found movement t*» supplement thi* methotl with the device* mentioned in the 


‘ H sleMiiU t«< n*in(' 3 iit*et, >t iIim> we have nemuned p < q. If p > </, we iulerrlmiiRf 
He* -!«')«-,,4.!,[ au-i ireK 3>*-a>!>'ia ve, n>r variaien, and hence must interchange p mid q in 
Stir#, nctoeii! f«ant«ula<*. p *-'«/■ i„n« r,<rre»p»iudiiig n< she «adcj*‘tident vxrinti*. 
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next, section. In the * .n-e *»i n*»tr than se n < 
possible to carry nut a further transform.-dmi* *<tt ‘U* ’>» 
“partial” variate* ?ri, 3 . 1.7.. wh'-re 


. if.i, “!! * lir <;Jy 

V sr j s H o,%, t.! 1 ? e < f 1. 


b n -- ("Y. w '.t f °r' js ir„ 


O .J n-j, 


u> coefficient*. This mahhs '-t- P* 1 *, , »„ “-mi* *4 no* i .iru? i* ■, * * 

which the fiief is relatH to tie pm tin! r '•v* hdvu • o,. .'T * 1,1 n * •, , 

i.e. to the second correlation fiedor whn h *l<'i** n>< ! « * jj th« ♦u/srfjp 1 -o*< ’" • u 
This method is, however, agam ton <und*‘r., »»<« 1 . u »i u, M > h ••.>* ,«rd ■» »j.* . '• 
rapid method of evaluating a J h . h . ,.' P mg' n> uU a, .j< i I,; pia'len* 

luts not ItCen entirely solved to tin' mule ij '■ ■/)!"!.). f i‘>n *U lie'- 5 ,CJ “ij'.lo uph 
ill the eottehuiuiK scetiuu ;ue mriitiomd d« M* < < e.s.v 5 . ? «v .ms .'-ioj, 

and which enabled tic* ?«•* in** Pa the j« m.otiMqj d ed ^ h ■ 1,, .t■ ,< m 1, », j 

to Ik* completed and added <o Apjs'icfiv I? 


7, Relations among, the n-moments. 1 , - ■■■. V, •• • ,\m „■ > ■ , r “ • 

1 ;, being random uvim-. ut rhe je ;p,e < m ih» x ■< *< ' ti 1 it*.-' d 

configuration lieing detemnm d l<v »3 0 ■ jn ».j «■.'«■» • .1 \ v. *. ■ • ** ■>«. ’ 

provide relation* among the -« ne<tu<id< 'lh. ■ * I • .t ■ , f- a 1 v •• 

(311 *«,» “ o,j * < o , 1, 1, »f „ 

the correlation of anv with a Pvd in '<■! a, <1 < ; ft.i. « g * *„ • •• ,M, 

(Xi i Xji. \/ 2 , i* a random correlation jn p >*p «< < oo. o tf < > • ■ * , , -! ,m 

with any othei is a uoidom e-mo-hitem bj» 1 , -p.oe J1 *> ■> j- ..j * t .•' t 

is Istst illustratiii liy an example and *«|tinin<ii. ..idlcwnt <i« "* th< 

six n-momelitH required for a* l, 3. i will !*e d«-tn<d 
For convenieuee. denote tin* lequiml mean rain* *4 

jjj 2lJ J73 ’ * 

anaiittji , «ii(tiiO*j, <*'j|0.*,«aj. (tiioivfijionoiii , uj.o ;o.' 1 >* ? .n-, 1 , 1 4 ..^ 1 . ,i, t 

by A, B t f\ I), E, F tejqmelivrly. MullipK the e, .,nd o;d« 1 q., ,d 

ttnttn » «ii«noaojj by expression di I foi : d. .in< •• tins r ( >h nto ,dl v 

unity, the consequent mean value* sire inmltm-d 1 lu» gjrei th.- tsoj <■ n 1 cimic 


.-l + t p “• 1 ■ li \n * 2i J np'p r 2 J, 

f q fil A +3(p- D/( 4 </> - l-,\p ■ o lf ’ j 
st -j“ (}! — 1 )/f t 'itp l l! p 0;J) 

f '/♦ fop p t / 1 

The moment,,1 in the mean of the triple jmahn * <4 the sq(s,(ji<4 .d r 
of , lj a and with x ( . The witne value must l-c realnwd wMh ■■u.% otle ; t)vd 
vector in tiie p-spnee, e.g. with either ir, i x v \ o,„ , fj , , t , 

/\/ p. This gives two relation* 


(30) 


.1 “ n i 1 ) 0 

iv + 1 )A - Mi - W) r i v 


2 ’ ’•<' * uE * K/’ 1 a 
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i:{ 


A final linearly indeirendent relation in obtained flout the mean triple produH of 
• 5ai, (ti ■ 5j), ■ 5?!, wliirh depends solely >*n Site internal configuration of 

5i , h amt 5», and is easily shown (e.g. eh«Mt«> 5, f *» minride witli one of the original 
axes of the n-spaeet to Tte 1 n 1 . This gives 

til?) pA 4 - 3 jiip — \\ 1 ) 4 pip • lt'p 2 t it*. 

The equations runtaiiied in flTiJ, idt'q amt bt 7 > determine -I, H. (', l>, K ami F. 

Similar equations euuld evidently he romifrueted for the higher-order moment*, 
e.g. for the term*-, 1 equiied for *f2, 1, I) nr nil, 1, 1,! „hut tie* nutidsTH of mtrh 
fernm mnr:e-e tupidly. Fiom Appendix 1 it will he wen that there are 21 
distinct tenim in *'2, 1, 1 and in in j»U, 1, 1, 1 . 


Appendix I. 


* 1,1 


e 2, 1 




• I ‘J,V 
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ON THE THEORY OF MARKOFF CHAINS 

lit I Ts.mnr W, M*vsriif<a r ? 

Vimnniy n/ 

1 . Summary. Although thou* f^inH •wdornjTi'.u'. i,"*«n *.u 4 

probability of indop-nderp aj*»2 v■ lr.opo" 1. <• bfi. wb jhh 

for tho analysis of mosi of tlif p^Mon 1 st» 0,*» ■ <•. ] t «h<. »] <•* r^ -4 

Iirotlability of dependent Kent) Imi- b^m raiho r^<'■*{«'•I 'j j,n pjri ,;c.v,, 
investigations in tin.* “ubjerf wne puided,,* d l A|*.»<? ’3 > lie ■,* •.*< re, "i 

hat) extended the fundamental Jurni 4 lj<*. ,U'Ut<> *»■ /hsui*-, of rSTf.v, 

The most extensive f\|wvi«ii'ii *>f ib- M<f K.-. 5 a M I nf.‘! *'. t 1 

In the present paper w*- *bnli de\*d«ij» usub-d -■! « wiping F.t. i 
chains of dependent variable), j.nd j nd the p:< '*.■*« 4 . u .* », I v, 

functions. It will liet-lemo that i<*t *<-?*!. i • ?1 < ** <.,g< 

distribution function*. can e\pri.,.,H*d iu u mr< > 1 s! r ?,,j ,, •, >, v . , , ,j 

vectors of a certain njH‘r,»t«ir r^-j e \|;my *-$ *V ji.irt*,. .4-. .;j« , r^i },«.* 

lmve Ireen fl|iplir<l to probh-jtt* sn e’«!etts ^4 u,< • » s !•<-,, < 

mqwtftni ajipln-!iti«Mi ha* b*<‘U in.eil*' l.j I **j*'„.g «i ,s vln. ■( ,< 5 ■, r * * • 1 

(on the basis of n simplified iha* JMtrio i',iA «ii<rg, , t,;,j * 5 » 4 

solid with cf«i|>erative element* t»< .♦ ph.ifw ti j*< ,*5,,,, * 

Application of linear operator theory 'tin .-ugh ni ,p„ < <* ,»i«j n.^vT, < .]■, ,*, -p, 
probability chains has apparently ltwn made In- Ib^iitetitY [«j 

2. Introductory Remarks. •'Upj«#w* then* exi»t^ a rhatn *4 e»<ribi p <4 
which might lead to one of t> jmwiUSe rt>w\i», and sibieb are n.W'biw'j ju *1 .< ) H 3 
manner that the probability of m •eitwmvr events leadtnji ,* 1 },;,a , < 4 jp 

»i)0?, • ■ • , IT* 

is proportional to 


/^aCtri ,»!,*•• , «*). 

The probability of a given function F(m , o a , • ■ - ,«j having a 1 o)r«« e..r.r^p^ 4 . 
»ng to the sequence of o's would Ire projKirtional to 



where 


(la) Fm - E {F(« t ,a?,«, t ... ,ft„)P/»„(<*,.... tWA ;. 
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and the summalhin extends over all valuew of 


{«/! ■* {(Tl , Oj , ■ • ■ , «,), 

The probability of a result «i of the first event leading to a result n„ of the 
nth event is 

( 2 ) F*(tn i Wo) - ( 1 /Fb) 23 l\{a\ , aj , > ♦ • , crj. 

In order to find the. probability of a given function F(«,, * •* ,n,) having a 
value between $ and £ d" A it is useful to know the moments and Thiele semi- 
invumnt* of F'n,, - ,«*!. Hath of these functions of Fean be calculated 

horn 

nu •- 23 F«fn'i , • • • , <*«) exp |*F(a, ,«:,••• , «„)}. 

t«;$ 

t tbvioiisly 

, j F„ lint i) n 

1 « is known iltlj that the wth Thiele .-eim-invariant is given by 
(!*,,) A„ lim rT log 

In the notation of CinmiT /(«), the eharaeteristie funetion of F, 

Ii bV; js defined so that <»(£ } hi - b'l$) is the probability that the function 
/’ «<;. ■ ,0,1 ]*»„• a Vitim* between £ < F(«j , • • ■ , «,) < f -p /i, then it i« well 

known tla< Ifijif (i<zi is eontimiouMkt x £ and x ■ £ -p h 

1 r T 11 r "'*V w{ 

«»i f»'i » /») ■ <7<£j lint exp(log/(w)Jdw 


whete 


Pkt t 


log/t«i 


V-' Am thill" 
£fi Hi; 


23 


rt-wtV 


A«(i«)"/w 


\ 


u(<J). 


When the derivative of frVfl with respect to £ exists, the probability of 

/’ (til , ‘ * ' , On) 


having a value Itetweeu {and j 1- d£ is 

(fibj ytf} d| «< Qt; of! df >> ^ lim I exp {]£ hjiwr/mlle^da. 

¥ *a$ ) 

From Wt 


(?) 


23 .4®(i«r/wl <r* 


-log Z„(0) + Urn e ^ 6 * log 'AM- 

Jt—O 
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»u,»r»n a'. momioo '! 


Since, for :t rom-taut r miicpt-jjiiHit < i j'. 
we have 

(«) r a ,,'iwr "' i w’ !<f(j A ■ *’ ■ 

anil from <0, 

m 1 ) fffc+ **-(;»$, 1 ihu f ,t!l *'“***''•&* 

T'klimtinit!* r tt, , Ji) mid f'l'j m<lie;d*' mn. i< * ■ j,t« u.u , 

t’fiain of correlated i<v*>ii»hhu» !«• oMnim-d ft--in m ho .wad*. <.{ * \\+ 

nmv introduce juoecdun-''* fi<t fS«»*deVnintcdc fi»f /„ ■ / Jot -t^-■v<*?.i"i jc o - J( , 
»f Pin | , • ■ ■ , «„*„ 

\Yhcn u is a continuous vamUb*, fto'j. di-, t ft,, 1 ^ m-i *i •* i, yi, ,, 

art* easily gcuersliwd Uv repheiw *?.«• !<ntuOoi5v»jo "i)»* .»<•,» t ,y, 

of tlu- «a liy integral*,ami i*y jcji] k.uk »}«• iu.»o,,% * ll( n- 5 i, C d,, j„< i« 

Ity integral r*ntmt»*ui* 


•> r i 

3. Simple Chains, 4 ££ pi,* . ,*■ t .. 

a, (hnmil tlvnn,. Hy ti Minj«l»* .-ham Modi jic an *-i < .. 3 i i, r 

cucli (if which leads tu one <>f v* |«««•»?!<{«- it* iultv snnj v, hi. h »«■( uym •>■.» u t j»,m.M 

that if the result of Jim J?fh cvctit is n* , tie* pruhid-ohiv «.f jj»*« -l * * ,,?,*• 
yielding a result «»*, to firujmrttoiMl to /e*», , «, 44; Hie* uepls* * *’h ,* nf,,*. 

probability of the ticciifccficc <*f tin* poqmm.r <f jroMilt,. 


«i »<«j * 


m 


« fi 


a0) li P ( <*> .«.*!»/ 22 II p o.. <I, 

and the probability of a first result «,, loading »■* an will result », 

( 11 ) /*.{«,,o.) « 2 ft,/sfl 

J?. *«*1 ‘ 5 f*»6 # 'fj,,! jrwt 

Tlw mmmaUonR am to !«■ oxtnnd^l ovrr nil r vahic of »-. v h ,* aU «\ 

on thcyrunuvmtum indiron. Chain* of tl„a (y,»- am rafted 

Markoff chains tiftor tlm first author who studied Uwin svsmrmd. w ftv 
l«rom ( 0 , the average value of a function , ... t „ vl )n 


( 12 ) 


Wo 


£-i:i ipc« f . 


X”* L^«,. •••.UJ lima, 


‘ «,j|t 
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Many chain function* /'<«!t nf interest arc either additu <• or in’ 1 ;! 1 * ij'JuM 
five ami nf one nf the forms 

(, Ifht) a) l'i(at , * - • | «») - hUi i , tKy'S -f- f /cVt., j , <«,,. 

(milt lit Fj((\s , «•?( ,| v n., J . 

In caw flit it i*t convenient to deline a new function h(a ,, nr,- l»y 

(M) f/tn'i »**)* c\|»|.r/ifrr, , o.ij 

and in hntli ca >c* to consider a function of the form 

ft 1 

ilM r\ 2^ II rt_j.it c\ji \jrh’a ,, ft,, * 

lor then tie* \nlut*r oi /’i and F- a\ei.it>ed inn (ho cntiic chain .no wiv it te, 

'I'ia* * ■ iim o lot Hr 

and 

1'ih /', 1 Y.. t (1 , 

\\ i.' ii u * hone. Hie do eel i", .dilation oi 1.%, ni.iv l*i”iM dilheult 

.. title I.iirc nnml'ci oi\uuahh- in\>>hi*d. \ ,<n ih* m On «* m h«H 

n**a tlitioduic a ji**o rs|ti11- ihat e lic-ed on the ohoiv Mm* '/,,, ■ * Me* 


1 uni of the 

eh mi ills of the Ath j 

to i\ el of 

tll>‘ mat!!'. 



/' 1, 

H 

• 1 ' f 

1.2 /«,<!» r 

11* 1 

1°, 

1 4 1 

. 1 1 /* i 

2, 2 1 ■ ■ ■ /V2, J' 



/OS', 

■ ! /*• 

e, 2 ■ /*,' e, c 

win if * hei 

eli nn Ilf 1 * /♦„ *i, 

,1' .lie 

deline/1 

a- 


tlM tf’ /Co, ,1' Jo 

<i aid it Mime *c,cj (hi .ante • <-( of value" a*- one oi th«* "si'nit" )•,»! use *« i- 
« , and i'a«*h o! the » iwsrihh* ic-ulm m lejctewnn d 5<v a uimj-sc mtem i of »h*‘ m 1 ! 
1.2. r ' TllUf '/.*,■•'' ,-Ulll of eh-im lltr* o| p? ' To I luplov till oloeiva 
tioji to ad A .«tlt tpe, S> * ti** the < hai a* In i Si* •«ahi« .Hid ie«loi. <4 tie* 

lit »tu\ F„ If m Aeil Known that it tin < Iim.v t< *e Me value ,ee *itnj<h «he 

»JlJIMl Iciir.tn \(Cfo1> held M ).(..! li>oj*ou.«3 ,< f li. i« 5 , U 
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satisfy the operator equations 

(20a) • ¥, = A„,4>„* 

(20b) P * * '!'■ , - 

where X (lI is the tth characteristic value of (IT), then 

p 

“ 23 ViM W-Aa) *- ( > when * ^ j. 

1**4 

Wo shall for convenience always assume tliat the ip's atuJ are ftvrmnliMd; 


<f, , ■ « 1 


so that in general: 

0 when i * j 

(21) - S., « , , . . 

I when i -- y. 

It is well known from matrix theory that one can expand a matrix element « 

(22) p,(a, 0) »= 23 X,.»v>i,»Cfl)||ii.*(a) 

.~i 

and that 


(23) X<,» » tf*,., • Px ■ I'.,* . 

By substituting (22) into the expression for Z„(x) in terms of P* \ mie ran 
show that 

z n {x) ■= 23 {X(,*r 1 (23 ¥>(,*0)} (l3 iiAa)\ 

^24) <~ l l"- 1 i W l 

- £*£*(*..«• 0(i*«u. 


Therefore Z„(.t) can bo determined from n knowledge of the characteristic vectors 
and values of the matrix P t , 

If there exists alargest characteristic mot A*, r such that 

Xt.i > | Xr,» j if i y L, 

one can obtain some interesting results. Before deriving these, we shall give a 
sufficient condition (which is satisfied in many chains) for the existence of this 
inequality, Fiobenius 111] has shown that if all the elements of a finite matrix 
are > 0, then the characteristic value of largest absolute value of this matrix is 
real, positive, and simple (nondegcricrnto). Thus, as long m v k finite and 
7>x(ot, 0) > Ofor all a and 0, (25) is valid. 

We shall now prove that 


(25a) 


lim 

n-»oo 



Z .fr) _ 


*} 


» 0 



MARKOVS < 1IAINS 


23 


that is, 

(25b) i>n(-c) • l)(l ■ 


First let us consider the case in which P t is a symmetrical matrix. Then 
vy,x(a) = all the characteristic values are real, and 

, z n (x) = x^W I ) 2 + 2 XiVWU*. 

\f&L 

From Cauchy’s inequality and (21) 


Therefore, 


1 | = 


£ PwW 


< 


£ VU,x(a) 


£ i 


— V> 


£ xr,7 , (y„ 1 -l)= < „ i £ x: 1 - 1 < v(p - 1) | Kf | 

iftl. ! r^f. 


where X„, t is the characteristic value 
This inequality yields 

fOE.A ! %»(*) _ . 

(2 } .x^wd 1 1 


of ? r second largest m absolute value. 

. I'-.z-i) AiA" 1 

~ I'l’i.yl)’- \Xr,J 


and (25a) (since X,,,/Xj„* < 1) follow.-. When P x is not symmetrical, one can 
easily derive the analogous expression 


| ZM 
iX^ 1 (‘h/..x-l)(l-'lT,*) 


d (r 0 I Xi.«/Xr„* I”"' 1 

(vi.,* 1 !) (I 1 '/';.,*) 


where 


.1 - [max || (<$,.* ■ U 1 1 |[max || (1 • 'Iq,*)]}] 

For brevity, when .r = 0, we. write \,. r an X,, '1\,* as 'Iq and ( I>,, r as <I>,. By 
summing (10) over all t*’s except cn , m and a n wo obtain the probability of an 
intermediate event leading to a resull <u it the results of the first and last events 
are known to have been «i and a„ Willi the aid of (21) and (22) it is easy to 
show that this probability is exactly: 


( 20 ) 


£ X" ' ,t X‘ -l i/'d'rti)^,(aM#(a*)w(a») 





£ 'P,(cyi)<PvM 

«l «n 


When n is very large, and when we have simultaneously n > > k > > 1, wo can 
rewrite this equation to include X fi , and neglect all terms containing other i’s and 
j's. This hauls to the results 

a) If the number of events, a, in a simple, chain is very large, the probability 
/*„(«*) of a fcth e.vent far removed from the first and the last, yielding a 
result at when a t , and a n arc unspecified is 


i , n(oit) Mctk) vi.(m) / ($i ■ 1)(1 • 'I'r.) ■ 


(27) 
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b) When k - n, the probability of the result «i • of the first event leading to 
the result a„ of the nth event is 


(28a) 


So, as n -y °° 
(28b) 


Pn(«i , a n ) 


X 1 ^tfailsSifofn) 

1-1 

Exr 1 £ 


P n {a x ,ctn) 


(>!>,• 1)( 1-4',.)' 


c) When there exists no knowledge concerning the result of the first event, the 
probability of the nth event yielding the result a,, is 

(29) Pn(«„) = £ Pn(«l, «n) ~ 1 t'£.(a»)/(l‘4>i). 


In chains of sufficient length for (25) to lie valid, the probability of 


F(ai 


having a value between £ and £ + h has an especially simple asymptotic form, 
From (0) this probability is (if for a given n we let T on*) 

<?(£ + h) - (?(£) - lim r (~) r Mi ^ 

2711 a-*po J-an 1 '* \ CO / 

(1 — e **) exp | —fa 9 Aj — + * * > 

and from (25) and (5) 

(31) A m n lim d m log \t,, z /dx m ~ nL m 

B -»0 

if 

(32) L,„ sa lim a m log 

B -^0 

Letting y = am*, (30) becomes 


r° cZy 


— V, 1 *«/ 

(33) 

^ , u-M-l 1 - 

27Tt a-*« J—a V 


{e~ iuH _ e -fw>) c - h 

where 


(34a) 

Mi = (£ - Ai)/n* 

M 2 = (£ + h — Ai)/n* 

(34b) 

Ai = average value of F(«i, ■ ■ ■ , a „) 


P. 


huy* i , 
"(In*' ' r 
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Integrating (33) 

(35) <?(f + ft) - G® ~ ( - 2 ^ 2)1 J e -M s /2ti[i + 0(1/b)] 

As n —» so and ft —* 0 

(35a) G(f + ft) - Oft) - (2 A-^ exp (~m ~ P\/nU), 

and the probability that £</'’<? + ft becomes Gaussian. 

b. Examples of a simple, chain, As an example of a simple Markoff chain let 
us consider an event which can lead to either of two possible results, say “ — 1” 
or “1”, Further, let us suppose that the probability of a given result being 
followed by an identical one is p and by one of another type is (1 — p ); that is, 

7>(—1. -1) - p(l, 1) = V 
P(~ 1, 1) = p(l, -1) = 1 - p. 

This chain would be encountered in an analysis of a sequence of tosses of a 
coin with a “memory” so that the probability of two successive tosses showing 
the same face of the coin would be p and that of showing opposite faces (1 — p). 

A question one might ask concerning such a chain is—What is the probability 
of the occurrence of a given number of transitions from one kind of result to 
another? In the chain of results 

-1, -1, -1, 1,1, -1,1, -1, -1, -1 

there would be four transitions, one corresponding to each — 1 followed by a 1 
and to each 1 followed by a — 1, The function giving the number of transitions 
in a sequence of n events is 

(36) F(ai, * • •, a„) = X) ft(»i, «<+i) 

i-i 

where 

ft(-l, -1) = ft(l, 1) = 0 
ft(-l, 1) = ft( 1, -1) = 1. 

liven though the a’s arc dependent, in this special case, ft(a,, a,+i) and 
ft(a„H, <*(+ 1 ) are independent so that (40) could have been obtainod on this basis. 

To apply the methods described in the beginning of this section we must find 
the characteristic values and vectors of the matrix 


(37) 


P, 


/ p (1 - p)e®\ 
\(l - pK p ) 


^the configuration index a has the value either —1 or 1 in this case instead of 
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“1” and “2" as given in (L7)). Tie* characteristic values are the roots of the 


equation 

j p — X (1 -■ p)r * 

- u 

[ (1 - pW I> ~ X , 

that is, 


(38) 

Xi,* = p i- (L - p)r f 
| X*, t | = )?i - fl — p)r f j < Xi,- 

and the characteristic vectors are 


fa,* = 2^ ^ uiui fa,t * 2 . 


The ancl <p vectors have the same components in this ease Iteciutw of the sym 
metry of the P x matrix. Clearly 

X*. — Xt = Xi.d -- i; X* — X3.11, 2p — L 

fa(.ct) = 2 * uinl i^i(rt) -- ~(t • 2 K 

From (20) wo see that if the result of the first event in the chain is «, , and 
that of the nth event is «„, the pro!nihility of 1 hr /,ih event yielding tin* result 
ouis 

t(2p - 1)* •’«,«* d- nil + (2 p - 1)" * at «t!„| 

'2(1 + (2/- — 1)"«ia,| ' 

As k, ni and {n - k) simultaneously gel very large, !\(m) ~ !), independently 
of ah. 

The probability of an Initial result«, loading In a final result a, is (front 28a) 
Pn( ai, « n ) = (|) jl + (2/i - ])" l Otia«( 

so that 


Pn{ 1,1) = P„(-1, — U « (l) [1 + (2p - l) n ' l | 

Pn(-1, 1) - P„( 1, -l) . (i) !1 - (2 p - 1)“ ‘j. 

Now, to answer our original question regarding tho probability distribution 
of the transition function (30) 

< 39) P(«r -2 

1-1 

we use the expression for Z n (x) determined from (24) 

(39) Z«(x) = 2[j> + (1 - V ) B 
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From (Q) the probability of thorp being between £ and £ -j- ft transitions in a 
sequence of n + 1 events i.s 

G(( + h) ~ G(a) - L f _ ,“■*)(,> + (1 _ p)e*“J» da/ o> 

(40) 

__ .L / " y a! (1 — p) k v"~ 

2*a - •«■ si_q (n — ft) I ft 1 

Letting x = tJi/2 and mu-ranging 


(Hi + ft) - (Hi) 


1 y »!(i - />) k P n ~ k n 

**=0 (u - ft)!ftl 


(l+?({ + «), 


where J>(\) is the Dirirhlet integral 


m -1 

V J-x x 


o if | x | > i 
4 IM = 1 

1 I X I < 1. 


Wc therefore have, when f£ •}- ft] < n 

(41) G(£ + ft) ~~ Git) - 'Z 

Mdi (n 


^\k u—fc 

V) V 
k)\kC 


Here (x) denotea the greatest. integer not, exceeding x. The sum is. zero if 
[£ + ft] [£ 4 1). \\ hen (£ +■ ft] > n 


(42) 


G(i + ft) - G(£) 


y - p)Vl fc 
'-ulti ~ k\(ri ~ kj\ ' 


When n is large it is ditlioult to get it clear picture of the function G(£) from 
(41) and (42), so wo shall develop asymptotic results for large n by psing (6) 
instead of (9). 

By employing (ft), we, see that (this sect,inn will be, developed on the basis of 
n + 1 trials instead of n) 


At 8 F a »(l •• p) 

A* np(l — p) 

A a np {l ~ /0(2/i - 1) etc. 


Therefore, from (11) 


AG »* G({ 4- ft) - Git) 


1 f 

2 ri 


*e <ul " v,, (l 




exp (—§np(l — p)of — inp( 1 — p)(2p — lV 3 /6 — • • •] dco. 
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Letting it = ojt?. 4 , we have 


uff-Ai)/n^_ p n» f $*A A)’ 


A G=~( - [e 

2m Los it 

r, _ v(i - p)&p - + q ( u *) 

L Cm 1 \n J 


-|$iSp{| pi 


“it** Ml 


du 


where 


Since 


Mi — ft + h ~~ Ai)/»* 
Mi — ft — Ai)/n*. 


f c auj c ft " du . 
J-cc 

i r wV “'c ,Xu du 

J-oo 


» (ir/fl) 1 exp (— x 2 / ttt) 

r-iXir 4 ( 1 __ X^\ vs 
“ 4a 1 *' 2 \ <>«/ 


we have for large ?t 


AG 


(43a) 


[ 2 irp(l - p)]l 


/ MJ 

X*i"ip(l p) 

... 

(2p - 1)X ^ 


1 


2;>(1 — -p)n' 

Aa n —► co and h —> 0, thin becomes 

h exp (-f£ - Pf/2?)(1 ~ f ))» 
[2irup(i - 7 j)] 4 


3/J(l - p) 




tlx. 


Oft + fc) - Oft) 


rsj __ 


(43b) 


l 


(2p - Dft - f) 

2/)(l - p)» 


+ 0 


(-;))■ 


A similar problem which occurs in statistics of high polymers can be staled 
abstractly as follows. Suppose there exists a sequence of events each of which 
leads to a translation of length a of a point, either to the right or to the left, and 
that the probability of a translation continuing in the same direction as its 
predecessor is p while that of changing its direction is (l — p). After n trans¬ 
lations what is the probability of a point being displaced a distance £ from its 
Origin. 

If “-1” represents a translation to tlve loft and "+l” a translation to the 
right, 


p(-i> -i) = P(li 1 ) - V 

p(-l, 1) = p(l, -1) = (1 - p) 
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Tin* function giving the. distance of the point from its origin after n displacements 
is (when « -=• ~t 1) 

F(ct [, • *' > - a ^ = |ac*i + h(cn ,«*) + ••• + M a n-1 i a n) + ' 2 a “n 

/“i 

where 

Ml, 1) — a, M— 1, —1) = —a 
Ml, -x) = M“l, 1) - 0. 

Neglecting the terms aa i/2 and aa„/2 in F(cn, ■ • • , «,,), one can answer questions 
concerning this problem by evaluating Z„(x) as defined by (15). In this case 
P x has the form 

v he i-A 

\1 - V !» “/ 


Its characteristic roots arc 

Xj.it 8=5 V cosh ax + [p 2 cosh 2 ax + (1 — 2p)]* = Xx,,* 

1 Xa.sr ] =» i ji cosh ax — [p 2 cosh 2 ax + (1 — 2p)]* | < Xi.x . 
and its characteristic vectors: 

i£i.t -- [(p — l) 2 + (pc“ — Xi) s ] ' __ 


'P’ia [(P — 1)‘ + (P fi0r Xu) ] 



Since 


p » Ai = lim d log z„(x)/dx, 

X~*0 


one can show in the present problem that P = 0. Therefore, the probau ty 
of the translated point being a distance between f and £ + h trom the origin 
after (n + 1) translations, is, as n —> and h~* 0 

/<’(£ 4 - h ) ~ F(£) ~ M27rftLj)~ i c“ E,,anl ' 


where /* is by (32); 

Lj m lim t) 2 log Xt.e/da: = a p/(l “ P)- 


Thus, 
When p 


/(’(£ 4. /i) — F(£) ~ h[a 2 2imp/(1 — p)F* e £ ( 

2/3 this problem is equivalent to the determination of the proba- 
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bility distribution of the components in an arbitrary direction of the distance 
between the ends of a linear polymer. In this ease 

F(£ + h) - F($) ~ /i(4aV?i) _l exp na 2 ) 

a result obtained by Tobolsky [12] after a lengthy and complicated combinatory 
calculation. 

Another type of simple chain is encountered in the determination of the 
‘‘life span” of a particle which is displaced a unit distance to the right or left 
per unit time along a straight line until it collides with an absorbing boundary 
either — (g + 1) or (p + 1) units from the starting point. This problem has 
been analyzed by M. Kac using tho methods discussed in the present paper. 
We shall generalize his results to include the effect of an attraction of the particle 
toward one end of the line so that displacements toward that end arc more 
probable than those in the other direction. 

Following the notation of Kac [13] we let Xj represent the jth displacement, 
mj its length, and S(m) the probability 'of a given displacement having the. 
length m. Then, 

s it m = 1 

5(m) = 1 — 8 if m = — 1 
0 otherwise. 

If N represents the life span of a particle, tho probability of its exceeding n is 
Prob [N > n] = Prob {-2 ^ X x < p, -q < X, + X s £ p, •. • , 

— i <1 Xx + Xi + ■ ■ • + A r n < p( = S &(mi)S{m%) • ■ > i(m„) 
where the summation extends over all integers mi, m*, • * • , m„ such that 
^ mi < p, -g < mi + wn < p, <••, —g ^ mi + mj +.<•+• < p. 

Defining the new set of variables 

a/ = q + mi + rrh + • • • + m, (j = 1,2, ••• n) 

we see that 


Prob {N > nj — zl 5(«i — g)5(«s — a,) > • * 5(a n — a*.i), 

As before, if we introduce tho P matrix (of p + q + 1 rows and columns) 

0 0 

l-s 0 

0 1 — 8 


P = («(« - 0)) = 


0 1 — 8 

8 0 

0 s 
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we obtain after applying the equivalent of (22) 

rrob (A' > n) = E >bWtf) E ^ 

/“I a n “0 

Where Xy is the jth characteristic value of P, and \j/j and <p s are its associated 
characteristic vectors as defined by (19) and (20) (here the range of a starts 
from 0 instead of 1 as in (17) and (19)). 

It is easy to show that the characteristic values of P are 

Xy * 2[s(l - s)]*cosfy O' = 1, 2, * • • , p + q + 1) 

where 


tl ~ irj/iV + 2 + 2) 

and that the components of the characteristic vectors are 

<l/ { (a) « [2/(p + q + 2)3*[s/(l - s)] ia sin (a + l)f, (a == 0, 1, • ■ ■ , p + q) 


and 

<pM “ iViP + 2 + 2)] ! [(1 - s)/a] ifl sin (a + l)fy. 


Since 

V* , t S V2 (1 - 9) (1 - l(-l )Wl - 8] KP+(,+2) ) sinfy 

<,^0 ' l ' ACtn " Vp + q + 2 1 - 2[s(r - a)J* cos ft 


we finally have 
Prob (A r > a) 


^_g^M n +«+ 2 )2 n rigjtn-s) 

V + 2 + 2 

"g 1 {l - (~l)Vl - g) }(pW1 i cos” ft sin ft sin (q + l)ft 

j-l 1 — 2Va(l — S) COS ft 


When s = | this reduces to the result of Iiac: (* means summation is only over 
event’s 


Prob {N > n} 


o 7l+£+^ 

—*——- E * COS™ ft sin (2 + l)ft cot $ft. 

P + 2 + 2 y-i 


4. Simple Chains With Restrictions. Often when studying chains of dependent 
events, certain functions averaged over the entire chains are known to be 
restricted between definite limits. That is, there might exist k functions 
g y(ai, oil , • • • , a„) such that 

(44) — AGy < Gy - 0y(ai , ,a n ) < A Gf , 0 = 1) 2 , * * • b), 

where the Gy’s and AGy’s are preassigned constants. To calculate averages of 
other functions (1) is no longer valid, for it is an unrestricted sum over all seta 
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of a’s, including those incompatible with (44). All unrestricted stuns in this 
formula (and other similar ones) must be replaced by sums over only those 
sets of a’s compatible with (44). Since it is sometimes more difficult to evaluate 
restricted sums than unrestricted ones, we shall apply an idea of Markoff [1] 
to the reduction of the former to the latter type. # 

Let us seek an explicit expression for a function l\(a, ,nj, • • • , <*■>' which 
has the property: 

P*(a i ~ , • • • .«») when «‘s are chosen 

ho that (44) is satis¬ 
fied of all j. 

0 otherwise. 


Since the Dirichlet integi'als 


- 1 psin OhAff,),^ (• 


*, - 1 L 


exp (t pi y,) dp, 


have the property 


Sj = 1 when — AGj <yj < Mr > 
0 otherwise, 


Pt(ai, • • ■ , a„) * i • • • faP n (ai, “ • , «») 
has the required character provided 

yi-Ql- Qj(<x 1, ■ • •, <*,)• 

The average value of a function F(a, , • ■ ■ ,a„) can be written in terms of the 
unrestricted sum 

P = £ ) ’ ’ • > <X«)Pn(<Xi, • • • , OCn)/ £ P n (ai » - * ’ , « n ), 

1“<I l«f) 

where the summation extends over the complete set of fa,) ’s 

(a.) =■ (ai, aj , • ■ • , a«). 

As in the case of chains without auxiliary restrictions, a useful function is 

Z n (x) = £ P*(“i, ■ ■ ■ , a*) exp [xFioti , • • ■ , a„)j 

(«,J 


-if--/' &,<«, {s 1 - 1 

7T^ J—co V—oo 1^ Pfrt J 


where 


Pi t ’ • • , pk) =* S Pn{cti , ■•■,«„) 


ajP(a 1( • * t X) PiO,(<xi, • • • , «*)]> ■ 

;-i J 
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When F(«i, ■ ■ * , a„) and [g,{ai , • • • , a„)) arc all additive or multiplicative 
functions of the form (13a) and (13b), say 

j n-l 

F(ai ,*••,«„) = 22 h(a k , a* + i) 

l 

0/(«i, ■••,«„) = 12 fib (a*, «im i) 

fc««l 

and the probability chain is a simple one, Z„(.r) reduces to a simple form. 
Suppose, 

«—l 

/’„(«! ,••',«,)= X) ?)(«; » «H-l) 
then following the derivation of (24), we have 
(40) S n (x, pi, • * • , Pk ) - L fXi^r^^^.p-Dd-W 

(«i 

where X|,*,p, and are characteristic values and vectors of the matrix 


! P*./iG >!■)*' 

■ ■ Ttr.pO, v)\ 

WpG, i) • 

■ ■ P*.p( v , V )J 


and 

Px. P («, (9) “ p(«, 0) exp {zh(a, Q) - iYj Pi </,(«, £)(• 

1 

Substitution of (40) into (45) allows one to calculate Z„[x). 


5. More Complicated Chains. In a chain of N events in which the result of 
each event depends on those of its n predecessors (ft << N), the calculation of 
Z n (x) proceeds in essentially the same manner as in the case of a simple chain. 
Let the N events he divided into N/n sets of “grand events” of n simple events 
each (for simplicity we assume N is divisible by n, this can easily be avoided). 
Thus, if each simple event could lead to any one of v possible results, a grand 
event could lead to anyone of v” possible results and a complicated chain becomes 
a simple chain of grand events with the result of each grand event depending on 
the preceeding grand event.. Quantitative calculations thus proceed formally in 
the same manner as in a simple chain. 

6. Continuous Case. In this section we generalize, by studying an example, 
to the case in which each event in a simple chain may load to any one of a con¬ 
tinuum of results. The example is a problem arising in statistical mechanics of 
molecular chains. 

Consider a linear chain of ft identical molecules whose centers of mass remain 
at a set of fixed regularly spaced positions, but which may rotate about their 
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centers of mass in a plane. Suppose, that the potential energy of interaction 
between neighboring pairs of molecules is a function of the angles a specified 
axis of the molecules makes with the line connection the centers of mass of the 
molecules; that is, the potential energy of interaction Iretween pairs of adjacent 
molecules can be written as 7(0/, d^i). Assuming that forces are sufficiently 
short ranged for interaction between more distant neighbors can be neglected, 
Boltzmann's theorem states that the probability of the axis of the first molecule 
making an angle between 0i and 0i + ad, with the, line of centers of the chain, 
the second between 0 2 and 0 2 + aS 2 and the nth between 6„ and 0„ + d0« is 
proportional to 

exp [-kT (7(0i, 00 + V(6 2 , 0,) + • • • + 7(0„_ t , 0„)|| dB, ■ • • d6„ 

where k is Boltzmann’s constant and T is the absolute temperature. The 
contribution of the interaction to the thermodynamic properties of the chain 
can be derived from the partition function 

/*2r p2jr 

Zn = l Jo ‘"Jo 

(47) r 1 1 

oxp Jrp (7(0i, 9 t ) + ... + 7(0-1,0O)|d»i ■ • • dO n . 

For example, the internal energy is 

£ = alog z„/d(-i/kT) 

and the specific heat is c = BE/dT. 

It is to be noted that Z„ is exactly the integral of the iterated kernel of the 
integral equation 

(48) W(0i) = j[ iKft) exp 0 S )| d$ t > 

If 7(0i, 0 a ) is symmetrical in 6, and 0 2 , this linear homogeneous integral equation 
has a set of orthonormal characteristic functions {^/(0)) such that 

(49) jf h(6)M0) de = S jk . 

To each of these characteristic functions there corresponds a characteristic value 
\j . Now it is woll known that the. kernel of (48) can be expanded as a series in. 
its characteristic functions 

exp {“ kf V{SL> 6t) ) = ? X *MiM«»). 

Introduction of this expression into (47) and applying the orthogonality condi¬ 
tions (49) one obtains 

Z„ = ^(0) dflj . 


(47a) 
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Probably the most interesting example of a molecular chain of the type 
described above is a chain of magnetic dipoles which are restricted to rotate only 
in a plane. In that case 

2 

i j-h) = [cos (0, — 0/ +l ) — 3 cos 8j cos 0,+iL 

Where m is the magnetic moment of each dipole and r is the distance between a 
pair of adjacent centers of mass. This potential function leads to the integral 
equation 

» Jf i(Bt) exp j~* [cos (fix - 0 a ) - 3 cos 0i cos 0 2 ] j d6 t . 

Since this equation is rather complicated to solve, we shall devote the rest of the 
section to a potential function of less physical interest, but which leads to a less 
formidable integral equation. 

In studying hindered rotation of molecules, one sometimes uses potential 
functions of the form: 


V(8 { ,8)+ 1 ) = cos ( 8, - 8f+ 1 ) 
where & is a constant. With this potential function (48) becomes 


(60) 

where 


,Jr 

X^(0i) *» / i(8i) exp [J cos (0 ( — 0 2 )| d02 

JO 


J = p/kT. 

The characteristic functions and characteristic values of (50) are easily found 
with the aid of the Fourier Series for exp (J cos 0): 


(51) 


exp (J cos 0) = 7o(J) + 2 E 7 m (,/) cos m 8 




where I K (J) is the mth Bessel function of imaginary argument: 

(i J) n+m 


Im(J) » E 


m-Q (lU fc) Ifc! 


From (51) 


eo 

exp [J cos (0i — 0j)J ■» h(J) + 2 E Im(J )(cos wi0i cos md s + sin infix sin infix), 

rtlwl 

Substituting this expression into (50) we have 

M(8x) = f%(0 3 ) lh(J) + 2 E /m(J)(eoB mfix cos m0 2 + sin m0 x sin mfix) \ dd t . 
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Because of the orthogonality of the trigonometric functions, one can verify by 
direct substitution that the characteristic functions are 

MO) = 1/C2tt) 4 

tm\9) = T~ 4 sin niff; = w' 1 cos niff, (w « 1,2, •■•) 

and the corresponding characteristic values are 

Ao — 2 tt/ 0 (./) 

Xm l) = xi 2> - 2ttI m {J) m > (). 

Introduction of these characteristic functions and values into (47a) wo obtain 
the simple formula for the partition function; 

= 2^(2^ /„(,/) r 1 . 

The internal energy of the molecular chain is therefore 

E = dlo% Z„/d{~L/kT) 

= -P(n - 1) h{J)/I «(J), 

and the specific heat is: 

c = aS/ar = - iy 2 ji + y^ r ; - 
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ON THE FIRST TWO MOMENTS OF THE MEASURE OF A 

RANDOM SET 

‘ By L. A. Hantal6 
Vnwcmdad National del Liloral , Argentina 

1. Introduction. In a remit paper [3] II. TO. Robbins derived general formulas 
for the moments of the measure of any random set X, and applied the formulas 
to find the mean and the. variance of a random sum of intervals on a line. In 
subsequent papers, J. Kmnowski and .1. Neyman [1], using other methods, found 
the variance when X is a random sum of rectangles in the plane, and II. E. 
Robbins [4] found the variance when A is a random sum of n-dimensional 
intervals in n-dimensional euclidean space. In the latter paper Robbins 
solved also the corresponding problem for circles on the plane. 

Using the methods of Robbins, our purpose in the present paper is to solve the 
following similar problems: 

(i) bet l{ denote the rectangle consisting of all points (x,y) such that 0 < x < Ai, 

0 < y ft .U , and lot IV denote the larger rectangle for which — 5 < x < Ai -+• 8, 
•~8 < ii < A<i + 8. bet p denote a rectangle of fixed dimensions, a X ft, but 
variable position in the plane. The position of p will be determined by the 
coordinates jr, ij of its center P and the angle t/> between the side of length a and 
the .r-axis. We suppose («" (/")* < min (.-h , , 8). Let a fixed number N of 

rectangles p be chosen independently with the probability density function for 
the coordinates (,r, y, of each rectangle constant and equal to t IV in the 
lliree-dimensional interval with base IV and height ir and zero outside this 
interval. In section 3 we evaluate the first two moments of the measure of X, 

where A' denotes the intersection of the set-theoretical sum of the N rectangles 

•0 

p with IV 

(ii) kb II denote the a-dimensional interval consisting of all points (xi , a- 2 , 

,r 3 , ■ , .rj such that 0 < Xt < A ,, (i = 1, 2, • • • ,n), and let IV denote tlic 

larger interval for which — J < .r, < /l t - + 5. Let a fixed number N of n-dimen- 
sional spheres with radii r (such that. 2r < min (/l ,,2 6)) be chosen independently, 
with the probability density function for the centre of each n-sphere constant 
and equal to IfIV hi IV and zero outside this interval. Denoting by X the 
intersection of the set theoretical sum of the N n-apheres with R, we evaluate 
in section 4 the first two moments of the measure of X, This problem is a 
generalization to n-dimensional space of the case considered by Robbins for the 
plane (n 2) in [4j. 

2. Preliminary formulas. Let K bo an indeformable plane convex figure of 
variable, position in the plane. The position of K may be determined by the 
coordinates (x, y) of a point P fixed within K and the angle <p which measures 
the rotation of K about P. We shall call x, y, <p the coordinates of K. The 
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measure of a set of figures congruent with K is defined as being the integral of the 
differential form 

( 2 . 1 ) dK = dxdydp. 

It is readily shown that this measure does not depend on the particular point P 
chosen to determined the position of A r [5]. For instance, the measure of the 
set of figures K, each of which contains in its interior a fixed point Q, has the 
value 2 irF, where F denotes the area of K ; that is, 

(2.2) [ dK = 2 rF. 

JQtK 

Let Pi and Pi be two fixed points and let l be the distance PJ\ . The measure 
of the set of figures congruent with K, each of which contains froth points P t 
and Pi in its interior, will be a function of K and l, say u(K, l). If d is the 
diameter of K, that is, the maximal distance between two points of K, we have 
m(jK, 0 = 0 for l > d. 

Examples. Let A be a rectangle p of fixed dimensions a X b, and let us 
suppose a < b. The diameter d of p is d = (a* + b 2 )K Let P(x, y) he the 
centre of p and p the angle which forms the side of length b with the segment 
line PiPi of length l. If we keep first ip constant, then in order that there exist, 
positions of p in which it contains the segment line P\Pi in its interior it is neces¬ 
sary that 

a - l sin ip > 0, b — l cos p > 0 

and in this case the area covered by the centres P in all these positions has the 
value 


Integrating over all 
(2.3) p{p, l) 

where we define 


(a — l sin p) (b — l cos p). 

» 

permissible values of p, we obtain 

J p&rculnla/tli 

(a - l sin p)(jb — l cos p) dp 

RTOOOBtb/Jll 


x if x <, 1 
1 if x > 1. 


Carrying out the obvious integration in (2.3) we have 
2 irab - 4 l(a + b) + 2 l 1 
4(ab arc sin {a/l) - $ a 1 - 61 + b(l* - 


g(p, l) “ < 


4(a6 arc sin (a/l) — arc cos (b/l) + b{l 3 
+ 0(? - fl 1 - i(A* + &*) - * ft 


for l a <, b 

aV) 

for a <; l £ b 
- a 8 )* 

for a < b <■ l. 


(2.4) 
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As another example, let R be the rectangle consisting of all points (a;, y ) such 

that 0 < x < Ai, 0 < y < A* and let R' be the rectangle consisting of all points 
(.r, y) such that 6 F 

-a < / < A t + 5, -5 <y < At + 5, (a 2 + h 2 )* < min (A x , A a , 5). 

Let 118 consider the set of rectangles p whose centers belong to R' and do not 
contain either J\ or 1\ ( j\ mul I\ being two fixed points which belong to R. 
Let l be tin* distance f*,P t . According to (2.2) and the definition of y(p, l ) 
the measure of the set of rectangles p under consideration is 

(^•5) 2 — 2.2 ?rp + m(p, l), 

where R' =* CL + 2 5) („1 2 + 2 5) and p = ah, 

Let K be a plane convex figure, of fixed position in its plane. Let us suppose 
K to be translated a distance / in the direction 6, and let F(Km, l , 0 ) be the area 
of the intersection of K with the translated figure. Obviously if d is the diameter 
of K, F(A, /, Q\ -- 0 for l > <1, In what follows we shall consider the function 

( 2 . 0 ) <HK, l) = F{K, l, 0) d0. 

Example. Let K he a rectangle R of sides A,, /i 2 . Let the symbol [a], as 
in [lj, be defined by 

It is then readily aeon that 

(2.7) F(R, l, 0) « [Ax — l sin 0] [A a — l cos 0). 

For our purpose the raw in which l < min (Ai, Aa) is of interest. In this case, 
carrying out the immediate integrations, we obtain 

(2.8) <1>(R, 0 - 2 t A t A, - 4 /(A, + A a ) + 2 l\ 

Let $ H , r he an n-dimensional sphere of radius r. S n , T will denote also the 
volume of this sphere, that is, as is known, (see [2, p. 109]), 

c* (^)" /a 

Let us call the measure of a set of spheres $„, r the measure of the set of their 
centers, That la, if the point P(xi ,*»,*•*, x») is the center of S„, r the measure 
of a set of spheres S„. r equals the integral extended over the set, of the differential 
form 




[xj 


x if x > 0 
0 if x < 0. 


(2.10) 


dP *■* dxidxi • • * dx„. 
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For instance, the measure of the set of spheres >S' n ,r, each of which contains a 
fixed point Q m its interior, lias the value 

( 2 . 11 ) f dP ~ 

where S n ,t is given by (2.9). 

The measure n{S n , r , l) of the. set of spheres .S'*,, , each of which contains 
totally in its interior a segment of length l{l < 2r), equals the volume of the 
intersection of two-spheres *S'„, r whose centers are placed at the end points of the 
given segment. That is, n{S n . r , l ) equals twice the volume of the spherical 
segment of an n-sphcrc of radius r and semiangle or = arc cos (l'2r). We will 
represent the volume of this spherical segment by jS',,.,( a) and it may be calculated 
in the following way: The intersection of the n-sphere with a hyperplane at a 
distance a; from the center is an (n - D-dimensional sphere of radius r' -= 
(r 2 - x 1 ) 1 . Let denote the volume of this (« — U-dimensional sphere 
(given by the general formula (2.9)). The volume of the spherical segment., 
whose base has the radius h = r cos a, will lie 

*3r,,r(°0 ~ [ htl l.r* dx. 

Jh 


Putting x 
we obtain 


= r cos 0 and substituting for the expression given in (2,9), 


—Cn-D/a j.r, /•« 

&,,(«*) = l Hiu " 0 dd - nS'„ Ur | sin” 0 dO. 


m 


Consequently wc can write 

( 2 . 12 ) ; M (S mr , l) = 2 SM 



sin" 0 dO, 


where $ n _i, r is the volume of the (n — l)-dimensional sphere of radius r and 
a = ai'c cos (l/2r ), 

In (2.12) we may substitute 


F 

Jo 


sin" 9 dd 


(n - lXn -3) • 3. 1 

n(n — 2) • • • 4.2 


arc cos (l/2r) 


_±IU 1 _ fY-"" 

2r \n \ 4 r 2 / ~ r n(n — 

+ ... 4 - 0 ~ 1 )(n - 3) - 3,1 

n(n — 2). .. 4.2 



(n-W3 


(2.13) 
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for n even, and 


(2.14) 


f Kin" 
Jo 


a do 


(n — 1)(» — 3) 


4.2 


«(« — 2) ••• 3 

n - 1 / ? Y'- 3,/ = 

i(n ~ 2) V 4rV 


ill( i-l 

2 r \ 4r 2 


\ (n-l)/2 

■try 


+ • *. + 


(n — l)(n — 3) 


4.2 


n(n — 2) ■ • • 5.3 


for n odd. 

In particular, for n — 2,3 wo have 


(2.15) M(^J.ni) " 4r* [ sin 2 0 do — 2r 2 arc cos (Z/2r) — * Z(4r a — Z 2 ) 5 
Jo 2 

(2.10) ghS’ 3 , r ,Z) = 2tt r 3 f sin 0 dO = | ttc 3 - Tr 2 Z + -L rZ*. 

^0 u 1^ 


Wo shall now generalize the formula (2.8) to n-spaec. 

A direction in n-spure may lit 1 given by the corresponding point on the surface 
of the n-dimensional sphere of unit radius, that is, by the end point of the radius 
which is parallel to the given direction. The parametric equations of the 

n 

n -sphere 2 £ 2 ~ 1 are 

i 

£j — cos ifii 

£2 ~~ sin <pl cos y>r 

(2.17) £a “ sin vi fim cos (pj 


£„_i — sin tpi sin <pa ’ ” Bin vn-s cos v,_i 

f„ — sin vi Bin ipo • • • sin Vn-a sin <p n -\ , 

where 0 < vi < r for t < n — 1 and 0 < v-.-i < 2 ir. The element of area of 

this n-sphere has the value (see, (2, p, 109]) 

(2.18) dc t — sin"“ a vi sin" -3 w ■ ■ • sin (faid<pt • • • chp„- 1 , 

A direction in n-dimensional space may then be given by the n — 1 parameters 
vi > <P* i ■ 1 ’ i Vn-l • 

Given the n-dimonsional interval R consisting of all points (;t-j, .r 3 , .r 3 , ■ ■ > , x„) 
such that 0 < x, < (i *■ i, 2, 3, • ■ ■ n), axul suppose that R is translated a 
distance Z(Z < min (At, A,, A,, • ■ • , /l„)) in the direction (vi , v>«~ 1 ), 

the intersection of the translated interval with It is a now interval whoso volume 

n 

has the value H (A, — x ( ), where x t — Z£; (£i given by (2,17)). 
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Our purpose is to evaluate the integral 


(2.19) 


HR, l)= I TL(A t - x ( ) da 
J*. i 


extended over the surface Bn of the u-dimensional sphere of radius unity. We 
shall denote by E m either the surface of the m-dimenmon&l sphere of radius unity 
or its area, given, as is known [2, p. 110] by 


( 2 . 20 ) 


E m = 


2r m/5 



Because of the symmetry, the coefficients of all the products • • • . I £n _ t 

have the same value 



The integral extended over the whole surface E„ equals 2" times the integral 
extended over the portion for which ft > 0. Hence, taking into account (2.17) 
and (2.18) we get 



«j. = (- 

/•r ;2 

-mi%. k f 

JO 

/»t/2 

• • - sin."' 1 ' 

Jo 

1. o > tIH 

<f>i cos^ism <fi com vi 

(2.21) 




■ • sin" * Vk cos d<pi dva ■ * * 


-<- 

2 h VE^ k 

' (n + k - 2)(n H- 7c — 4) . 

• • (a + k ~ 2fc) 

for k = 

1,2, ••• 

, n — 1. Bor k 

= m we find that 



pW2 

r-r/J 



“n = (' 

-1) B 2T 

Jo 

■ Sn~J 

/ sm <pi < 

Jo 

30S ifi i 

(2.22) 




sin <p„~i cos i dv’i rfe • • • d^H-i 


= (_m“_ tl _ 

’ (2m - 2)(2n - 4) • •. 4.2' 

Hence, we have the following general formula 


HR.D -a,a 2 ... An E n+ (-iy ~—** 1 * . 

(2m - 2)(2m - 4) ... 4.2 


+ S ( - 1),( . 


2*1%. 


(n + h- 2)(m + k - 4) ... (n + k - 2k)' 


(2.23) 
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In particular, for n =» 2 this result coincides with (2.8). For n = 3 we have 
1 ) — iirAiAzAi — l 3 — 2 irl(AiA2 + AyAa + A2A3) 

(2.24) 

+ f^Cd-l + -^2 + -4 S ). 


3- First problem. We can now solve the first problem (i) stated in the intro¬ 
duction . Denoting by the same letters either sets or their measures, we consider, 
as in [1] and [4], the set Y of points of R that do not belong to X. We have 
identically: 

(3.1) X + 7 - R. 

The general method of Robbins [3] taking into account (2.2), gives immediately 
the first moments 

(3.2) BOO - R (l — £)’, E(X) - JR jl - (l - $}, 

where R — A\A *, R' (.4i + 26) (At + 26), p = ah. 

Our remaining problem is that of evaluating the second moment of X. Let 
x< 1 Vi ) <Pi (t = li 2,3, ■ • • , N) be the coordinates of the N rectangles p (section 2) 
and let us put, as in (2.1), dpi = dxAytdtp,. Let P(x, y) and P 0 (x 0 , 1 / 0 ) be two 
points which belong to R and let us put dP = dx dy, dP 0 = dxody 0 . Let us 
consider the following multiple integral 


(3.3) 



dP dPo dpi dp 2 “ • dp.v 
dirR 1 )" 


extended over the sets of rectangles p< (congruent with p) such that x belongs 
to R', 0 < <pi < 2tt, and do not contain either P or Pc ,. That is, the domain of 
integration of J is defined by 


(3.4) 


— 5 < X{ < Ai + 6, —6 < j/( < + 6, 0 < <p< > 2x, 

P « R, PotR, Pip,, Poi Pi, (t = 1, 2, • * • , N ), 


In order to calculate /, we can first keep the rectangles pi fixed; the points P 
and Po can then vary independently over the set of points Y. That gives 

cl T f dpi dpt ' ‘ ‘ dpN _ 

(3.S) J -- E(Y) - 


We can now reverse the order of integration, an operation which is obviously 
justified in this case. Keeping P and Po fixed, we can vary each rectangle pi 
over the set of positions in which it does not contain either P or P 0 ; letting l 
denote the distance PP 0 , we have, according to (2.5), 

(3.6) J = f (l- 4irp " dPdPo. 

PeRiPatR 
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In order to evaluate this integral we divides it into two parts J — J i + J 2 , 
according as 0 d l < din’d < l II, where 1 e/ = (ei b J and /) — (.1 1 -p .la)^. 
In the interval 0 < l < d we introduce the* new variables of integration /. 0 
related to x, y, t a , y 0 by 

(3.7) .t 0 = a + / cos 0, ?/ 0 = y + / sin « 

whence 

df.r, y,Xo,i/o) _ , 
i)(x, y, l, 0) 

In terms of the new variables we have 

In this integral the point J” can vary over the intersection of It with the figure 
obtained by translating R a distance’ l in the* direction 0; Unit is, the integration 
of dP gives the function F(R, 1, 0) defined in section 2. According to (2.0t wo 
therefore have 

(3.8) ./, - j[ J (l - ~ p - ^ p ' l)S j" ‘UR. in ,11, 

where p(p, l) is given by (2. 0 and <I*(/it, l) by (2.8). 

In order to evaluate Jt wo observe that in the inlet val d <" I < It n<p, l\ 0 
and we have 

/ ,w dF ° 3?)T/ 

tisl^,n ' J a*-, t 

Further we have 

(3-9) J dP dpQ - if 

0 ilSB 

and with the change of variables (3.7) and the formula (2.8) we find that 
(3.10) f dP dPa = f" $(R, 1)1 dl = wfi/hd 5 ~ i (A, A AA <l' V \ d\ 

J ^0 rS 2 

0 ^ E^d 


Collecting (3.8), (3.9), (3.10) and taking into account (3.5) we have 

— l R " ttAiAz d l t| (/I j + rl a) cf 1 — Irfd, 


(3.11) 
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where p - ah , I( -i i .1 2 , ft' ■= (/li + 25) (dj + 25), p(p, i) is given by (2.4) and 
by /28). 

For the variance of A’ and of Y, we have by (3.1) and (3.2) 
o> « A’(A' 2 ) - F 3 (.Y) - J?(F*) - 7^(F) 

/ \Stf 

• | ft 4 - ir/lt^lirf* + ItfcMsW “ id*} - 7e 2 (l-^J , 

which completes the solution of our first problem stated in the introduction. 

4. Second problem. In order to solve the second problem (li) stated m the 
introduction we will follow the same method of the preceding section. 

Lot X he the intersection of the set theoretical sum of the N H-dimensional 
spheres S„. r of radius r with the. n-interval It, Let us cull Y the set of those points 
of R that do not belong to X, that is, 

(4.1) X + r = R. 


The general method of Kolduns gives immediately 


(4.2) 


E(Y) « It 


0 -Iff- 



where It (•<« -I- 25), and »S n ,r is given by (2.9). 

We now proceed to calculate Ii(Y'). For this purpose letQdyi, y \, • * < ,y\) 
and (Myl, V\ , • ■ • , y\) be Uvo points which belong to li and I\{xl, ®s, • • • ,x'„) 
be the centers of the .V spheres E n , r . Let us put 

(4.3) dQt » dyldyi • • • dy'„, (i « 1,2), dP t = dxidxl ■ ■ ■ dx ' n , (» = 1,2, • ■ •, N). 


Consider the integral 


(4.4) 


J 



dQidQtdPidPt ••• dP„ 
It'* 


extonded over the domain defined by 

Qi « R, Qi e R, Pi e If, QJ> { > r, QTPi > r, {i = 1, 2, ■ • • ,N). 

If we keep Pi , 1\ , l\ , • * • , P N fixed, each point Qi , Q% can vary independently 
over the set F; consequently we have 

(,j 5 ) j = f = E (F a ). 

It 

On the other hand, if wo keep Qi and Qi fixed, the integral of each dP{ gives 
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R' - 2 S„, r + M(<S n ,r, D where n(S n , r, 0 is given by ( 2 . 12 ) and l — Q 1 Q 2 . 
Hence we have 

In order to calculate this integral we split it into two parts ./ - Ji + Jt , 
according as 0 < l < 2 r or 2r < l < D, where D - 4*)*. in the interval 

, ¥>n-l 

■ »n), 


0 < Z < 2 r we introduce the new variables of integration l, wt , <& , 
related to yl, y\ , • ■ • y\t Jh» 2 /» , ’ ” > V» 


(4.7) 2 /* = l/ l ( + i 

where £, is given in (2.17). It is found that 

„, 1 1 1 2 i .2 

d(yi, 2 / 2 , 


l 2 - 

, 2/njJ/li VU 


,2/n) 


= Z "- 1 sin"- J VI sin" - * 


(i = 1 , 2 , 


<pt ’ ’ ‘ Sit) ^n-2 ■ 


3(j/l I y\ > • • • ) Vn I Vl > * ’ ■ ^ n_1 ) 

Hence we have, 

( 4 . 8 ) dQidQa = Z"” 1 dQido&l, 

where da denotes the element of area of the 12 -dimensional sphere of unit radius, 
given by (2.18). The same method used in section 3 gives 


(4.9) 


J, = f (1 - 2Sn ' r m or 1 di, 


where $(2Z, l) is given by (2.23). 

In the interval 2 r<l<D ii(S n , r ,0 = 0 and we have 

(4-!0) r . r y 

• < dQM - dQidQt?. 

[JoalSP JosSl£tr ) 

Now we have 


(4.11) 


[ dQi dQi = R~ 
Jo SISD 


and with the change of variables (4,7) we readily find that 
(4.12) [ dQi dQ 3 = HR, l)?" 1 dt. 

J 0£!£2r J 0 

Collecting (4.9), (4.10), (4.11), (4.12) and taking into account (4.5) and 
(2.23) we have 
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(4,13) 


E(Y 7 ) ~ f (l - *(fi, dl 

+ (l- ^Y/p= _ 2V’ _ , ty . 2 a V" 

+ V R 1 ) r n, KC ’‘ ( ^ 2n(2n - 2) ... 4,2 


£ (“!>*(, E. ■••<-*) 

fc«i ii.h,''ui n -k 




iW 


,n+fc 


(n + k)(n 4- fc — 2) ... (n -f k — 2Jc) 


where2? =» p[v4 (( i2' = JJ(>l<4-2S);5 n , r i8glvenby(2.9) ) B m by(2.20),^(jSf n , ri l) 

by (2.12) and <L(J?, f) by (2.23). In particular, for n = 2, we obtain the value 
given by Robbins [3, (30)], by use of (2.8), (2.15) and the equations = rr 2 , 
Ei ~ 2. For n — 3, the case of ordinary space it follows from (2.10), (2.24) 
and the equations >S' 3 , r = 4 F a = 4 x, E t = 2ir, that 


(4,14) 



1 _ - ~y(irR — t — 2r(A l A ! +A 1 A i 

4- AiAYjl 4- g (.4 t + Ai 4- At)l~J l 1 dl 4~ ^1 ~ 3 $^ 


— -L- x/2r a 4" 8x(4.ii4j 4* At As 4* d.jda)r a 

O 

250 , , , , 1 a \ 6 1 32 A 
- j-g- (^4i 4- At 4- ^-j)r 4- -g- r >. 


In this case the exact evaluation is easy if one expands the binomial under the 
sign of the integral and integrates term by term. 

From (4.1) we see that <r’ x - E(X') - E\X) = F(7 5 ) - E\Y). Thus, 
from (4.2) and (4.13) we obtain immediately the second moment E{X 2 ) and the 
variance a\ of X. 


5. Remark. In the second problem we can substitute the n-intervals R and 
R ' by concentric n-dimensionol spheres, The problem may then be stated as 
follows: 

Let S n , a denote a fixed n-dimenaional sphere of radiuB 0 and £„,„+} the con¬ 
centric n-dimensional sphere of radius a 4- $. *S„,o and S„, a+ j shall also denote 
the corresponding volumes. Let a fixed number N of n-dimensional spheres 
with radii r (r < min (o, fi)) be chosen independently with the probability density 
function for the center of each S n , r constant and equal to l/<S BlS+s in S*,„+» 
and zero outside this n-sphere, Let X denote the intersection of the set-theo¬ 
retical sum of the N n-spheres with S n , a ; we wish to evaluate the first two 
moments of the measure of X. 
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It suffices to observe that in this case we have 


(5.1) 


$(S n , a , l) 


( a 


sin" ft dft 


where S„- l>a is the volume of the (n •*- 1 t-diincneioiml sphere of radios a and 
a = arc cos (f/2a). 

The same method used in section -i gives 


(5.2) 





'1 


{ 




E(Y 2 ) 

(5.3) 


[*( i - * l) Y <j>(S't,, a , or' 

Jo \ On.a+J / 


dl 

tr 


ffif.s'n.a, nr 



where 4>(S„ , l) is given by (5.1). 

In particular, for n - 2, by use of (5.1), (2.15) and the indefinite integrals 


J arc cos (l/2a)l dl = (U~ ~ a 5 ) arc cos (l/2a) - \ l(-la 7 - /V + constant, 


f £ a (4a 2 - iV dl = — |f(4a s - l 2 ) 1 + la‘i(4a* - 

-h 2a 4 are sin (l/'Jki I -}- constant 


we find that 


E(T 2 ) = 2rr 



2tt r a — 2r a arc cos (t/2r) -Uf-f r 

r(a + 5) 8 


l*)» 


(2a 1 are co«(//2o) 


- U(V - I s )*) I cU + (l - - ( - 2 ffi_)“ [„V - 

— 3a 2 r(o 2 - r 2 )* + ira 4 + 2r(a s - r 2 ) 5 — a 4 arc 


( a 


2v ( 2a a (2r a — a 2 ) are cos 


sin (r/a) 


)}• 



For n = 3, we have by (5.i) and 2,10) 

mf) - ^ f (l - (ha 3 - raU + 

+ 4ir ^1 - |w* — ■ViraV + 4rria a r 4 — 

From (5,2) and (5.3) with the use of the relation o> = E(X 3 ) — lf(X) ** 
E(Y 2 ) - E\Y ) we obtain immediately the second moment E{X r ) and the 
variance 4 of X, 
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ON A TEST OF WHETHER ONE OF TWO RANDOM VARIABLES 
IS STOCHASTICALLY LARGER THAN THE OTHER 

By H. B. Mann and D. R. Whitney 
Ohio Stale University 

1. Summary. Let* and y be two random variables with continuous cumulative 
distribution functions / and g. A statistic U depending on the relative ranks 
of the *’s and y’ s is proposed for testing the hypothesis/ ~ g, Wilcoxon proposed 
an equivalent test in the Biometrics Bulletin, December, 1945, but gave only a 
few points of the distribution of his statistic. 

Under the hypothesis / = g the probability of obtaining a given U in a sample 
of n *’s and m y's is the solution of a certain recurrence relation involving n 
and m. Using this recurrence relation tables have been computed giving the 
probability of U for samples up to ft = to = 8. At this point the distribution is 
almost normal. 

From the recurrence relation explicit expressions for the mean, variance, and 
fourth moment are obtained. The 2rth moment is shown to have a certain 
form which enabled us to prove that the limit distribution is normal if m, n go to 
infinity in any arbitrary manner. 

The test is shown to be consistent with respect to the class of alternatives 
/(*) > g(x) for every x. 

2. Introduction. Let * and y be two random variables having continuous 
cumulative distribution functions / and g respectively. The variable x will be 
called stochastically smaller than y if /(a) > g(a) for every a. We wish to tost 
the hypothesis / = g against the alternative that * is stochastically smaller than 
y. Such alternatives are of great importance in testing, for instance, the effect 
of treatments on some measurement, One may think of * as the values of 
certain measurements in the control group and of y as the values of the same 
measurement in a group which received treatment. In a particular instance 
the protective effect against infection by certain bacteria was investigated. 
Two groups of rats were used in the experiment. The first group receiving no 
treatment, the second group receiving the drug. Both groups were then infected 
with supposedly equally diluted cultures of the bacteria under investigation. 
Most of the rats in both groups died, but the time of survival was measured and 
it was desired to test whether the drug had the effect of prolonging the life of the 
rats. It was desired to make inferences from the effect on rats to the effect the 
ding would have on humans. Thus, the only relevant alternative to the hy¬ 
pothesis that survival times are not influenced by the drug is that the survival 
time of those rats which received treatment is stochastically larger than that 
of the control group, 
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3. The U test. Let the quantities a:, , ■ • • , x„ , y,, ■ ■. , y m be arranged in 
order. Thin arrangement is unique with probability 1 if P(x, = y,) = 0 and 
this follows from our assumption of continuity, Let U count the number of 
times a ij precedes an .r. If P(U < V) = a under the null hypothesis, the 
test will he considered significant on the significance level a if U < U and the 
hypothesis of identical distributions of x and y will be rejected. 

This test was first proposed by Wilcoxon [1]. His statistic T is the sum of the 
ranks of flu* i/'s in the* ordered sequence of and y’ s. In general 

r - . + 'l&L+ii _ T 
2 

and this gives a simple way of computing V. Wilcoxon, however, treated only 
the case m ® » and in this ease* lie tabulated only 3 points of the distribution of 
T. Since the test seems of great utility it seemed worthwhile to compute the 
variance, the moments and the limit distribution of V and to investigate the 
class of alternatives with respect to which the test is consistent. 

Although this paper is written in terms of U and the probabilities of U are 
tabulated the results etui Ik* easily interpreted in terms of T if so desired. 

4. The distribution of U, Consider now ordered sequences of n x’s and 

in y'x. Sin re it is only the relation between x and y that matters we replace 
each x by a 0 and each </by a 1. lad C count the number of times a 1 precedes a 
0. Let be the number of sequences of n 0's and in l’s in each of which a 1 

precedes a 0 V times. By examining a sequence with the last term omitted wo 
arrive at the recurrence relation: 

fin .|«<P ~ m) + 

where » 0 if f r < 0 arid fjV), pi,,(V) are stern or one according 

as V tl or f ’ =® 0. 

Under the null hypothesis each of tlie Im + ?i)!’m!n! sequences of n 0’s and 
m IV is equally likely . ("onurqumtly if ttf ") represents the probability of a 
sequence in which a 1 premies a 0 V times then 

(1) finmll') *“ n + m Ps i” il “ + n + - pom-iW). 

Using the rwurrenn* relation III the probabilities p, m (V) have been tabulated 
for m < n < H Table 1 *. For m -• n ••=•••• 8 the distribution of V — %{nm + 1) 
differs only a neglijdble amount from the normal distribution. We shall, in the 
following, derive the mean, the variance, and the fourth moment of U, and 
prove that I he limit riwtributkm of r is normal if» aiul m l*>th approach infinity 
in any arbitrary manner. 

It is obvious that •« 

Since the probability of the »th I prm-ding thejtk 0 i« §, we have 

(2) « nw/lL__— 
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TABLK I 

Probability of Ohlaimny a V not Larger than that Tabulated in Comparing Sumplrn nf 

n and m 

n m 4 


n = 3 


\ . 
A. 

1 

1 

2 

0 

.250 

.100 

.050 

1 

500 

■ 2(H) 

.1(H) 

2 

.750 

. 100 

2(H) 

3 


.600 

. 350 

4 



500 

6 



.050 



0 

acm 

' .IH17 • 

,(12K 

: .ou 

1 

i . \m 

.133 | 

.057 

.1120 

- 

i MKI 

.2117 ! 

.111 

.057 

3 

1 

i 

1(H) j 

21 HI 

i .100 

1 


(UK) j 

.311 

.171 


i 

... . 

.12!) 

.213 

0 

| 


1 .571 

■ 3)3 

7 

i 

'l 


113 

K 

j 



557 



?i * li 




m \ i i 

i ! ,i 

4 

% i, 

V 

■'-si , 

i 

: 



0 i 

.143! 

.030! 

.012; 

.005 

.1X12 

(X)li 

l i 

,28ft! 

.1)7)! 

.024' 

.010 

IXM 

.(HI2 1 

2 I 

. 128 

. M3! 

.018! 

.010 

(XHI 

.(XU' 

3 i 

.571 

. ,j 

.214 

■ i 

083 1 

.033 

.015 

. ocik! 

-1 


.321; 

.Utli 

037 

026, 

1 I 

.1113’ 

5 

i 

.421) 

.11X1 

08(1 

Oil, 

.021 

6 


.571 

.274' 

... J 

.120’ 

(HI3 

032 

7 



.357 

. 176 

.080 

“i 

.047; 

8 



.452 

.238 

.123 

.00(1! 

0 



.648 

305 

.1115 

.000 

10 




.381 

211 

,120 

11 




.457 

.2(58 

.155 

12 




.516 

.331 

.107 

13 





.300 

.242 

14 





.406 

.204 

15 





.535 

.350 

16 






.409 

17 






.400 

18 






.531 
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TABLE I (Ccnilinucd) 
n = 7 


\ « i 

1 







V . ’ 

l 

-5* 

2 

3 

4 

s 

6 

7 

0 

.125 j 

.028 

.008 

.003 

.001 

001 

.000 

1 

.250 

056 

.017 

.006 

.003 

.001 

.001 

2 ; 

375 

.111 

033 

012 

005 

.002 

.001 

3 ! 

.300 

.107 

058 

.021 

000 

004 

.002 

i ; 

.025 j 

.25(1 j 

.0112 

030 

.016 

,007 

003 

fi | 


.» i 

133 

.055 

.024 

Oil 

006 

0 | 

1 

.441 

. 102 

082 

.037 

.017 

009 

r*t 

1 

; 

.550 

.258 

.115 

.053 

.026 

013 

8 



.333 

.158 

.074 

037 

.019 

!) 

1 


.417 

.200 

.101 

.051 

.027 

. Hi 



300 

.20-1 

131 

.069 

03(1 

11 



.383 

321 

172 

099 

.mo 

12 




304 

.210 

.117 

064 

13 



1 

.464 

265 

.147 

0S2 

14 



i 

538 

.319 

.183 

.101 

If) 





378 

223 

. 120 

10 

i 




■138 

.207 

1511 

17 



i 


500 

.311 

191 

18 



j 


502 

.365 

.228 

10 



l 



118 

,207 

120 






,473 

.310 

21 






.527 

.355 

22 







.402 

23 







.451 

24 







mm 

25 



1 

1 
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We now seek an expression for E„ m (u 2 ) where u = U — nm/2. After multiply¬ 
ing (1) by (U — nm/2) 11 , using 

E nm (u 2 ) - - nm/2Y Vnm {TD 

V 

and expanding: 

(3) E nm (u ) = —t—- E n - lm (v?) -j- - — E nm -i(u ) + nm/4, 

n m n ■+ m 

where E nm (u) denotes the expectation of (U — nm/2 ) in sequences with n 0’s 
and m l’s. The initial conditions of (3) are seen by direct calculation to be 

(4) E n 0 (u 2 ) = EM) = 0. 

By substitution E nm {u 2 ) = nm{n + m + 1)/12 is a solution of the recurrence 
relation (3) and its initial conditions (4). Hence, it follows by mathematical 
induction that 

(6) E nm {u") = nm(n + m -f- 1)/12. 

The fourth moment is similarly a solution of the recurrence relation 

(0) E,M) - ^ ^ 

+ -- 1 (2 rfm + 2nm~ —n ~ m s — nm ) 

which is obtained from (1) by multiplication by (U — nm/2)* and expansion. 
The initial conditions of (0) aro found by direct calculation to be 

(7) EM) = EoM) = 0. 

It may be verified that 

(8) E n ,„(u*) = + 5 nm? — 2n 2 — 2m" 4- 3nm — 2 n — 2to) 

satisfies the recurrence relation (6) and its initial conditions (7) and hence (8) 
follows by mathematical induction. 

To investigate the limit distribution of u as n, m become infinite we investigate 
the rth moment. Following the same procedure as in the case of the second and 
fourth moments and using the symmetry of the distribution to find the odd 
moments zero we get the following recurrence relation, 

(9) EMl = lnm la E n ,. !m M' ia ) + mn u E^-r (u ,v 2 “)) 

For r = 1, 2 it is known that Enm{u tr ) is a polynomial in n and m of degroe 3r 
and that it is divisible by nm(n + m + 1). Assuming that E„ m (u a ) , a < r 
is a polynomial in n and m of degree 3a divisible by nm(n + m + 1) we will 
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show that it 13 possible, to find a polynomial of degree hr in n and m dividhle by 
nm(n + m + 1) which satisfies the rerurrcni'c relation til) fur E~,d j/X and 
also its initial conditions, namely, E t ,n («*' 1 ■- End 1 
The last condition is trivially satisfied if TiW a 2 ') isdivisiblcby mrEn - 1 m \ p. 
Our method here is to actually substitute a polynomial with undetermined 
coefficients into (9) and show that, the coefficients can 1«* obtainr-d uniquely. 
Rearranging (9) we obtain 


m 


( 10 ) 


EM r ) - - l E n ln (u* r ) - 

n + m n + m 




1 XT' ( 2 r\ 1 . 1 1 

— - • -■ / j ( — J '' 1 WH I‘‘ r. I -i *1 

n + n a ~.i \2«/-p 


") + W" h\ m tiit* **)| 


Since for X < r we can write E„ m (u°') -- nm(u 4- »» 4 1 * when* /')«* i<* 

a polynomial in n, rn of degree 3X — 3 the above equation reduces to 


(11) E, m (iC r ) - . n -E nX Ju* T )~ m E^ ri» 2, t nndf;«i l 
n 4- tn a 4- m 

where QnV is a polynomial in n, m of degree Hr — H. 

Nmv let 

Sr 1 

E nm (u 3r ) - nm{n 4- m + I) 22 «i , 11 ** 1 * 

ihrr.Sr. i 

where aqr = a^are to bo determined. Substitution in (11) yields; 

52 a lt [(n 4- m -\- 1 )»W - (n - l)(n - O'm* - (m - 1 Jn'im - l i # J •< c/^ 3 


and rearrangement yields: 


( 12 ) 


lij-O 

(+/£8r~« 


a,i 


71*7X7 4* 


i( i + l ) 

a-0 \ a / 


(-1)' Vm" + n'w 


] 


„ fi* w * 

fill * 


Consider first the terms of degree Hr — 3. In thin case i 4- j •* Hr — H and 
a = i will give 


3r—8 

2 + (i 4- 1)(h v *“W 4 - nf ni' v * Ml 

ViwO " * 


C 13 ) 3r 22 aiar~i~i n 1 m 3r " i 

Equa & the coefficients of lhcae terms o£ de g roc 3r ” 3 the eorrcstptmding 
ones m Q nm it is possible to calculate the value of , (i ~ 0, • ■ ■ , Hr — 3), 

We assume now that the a i} are known for i + j > 3r - 3 ~ (A L j) MX J 
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we will find the value of a t j where % -j- j = 3r — 3 — k. Consider then the terms 
in (12) of degree 3r — 3 — lc. These terms will occur when 

i + J = 3r — 3, a = i — fc; i -f- j = 3?' — 4, a = i — k + 1; • • ■ ; 


i + j = 3r — 3 — h, a — i 

All, but the last, contain coefficients which have already been evaluated. The 
last one reduces to 

(3r - 3) 

i-0 

Thus by equating coefficients <z l8r _for % = 0, 1, • ■ Sr — 3 — /c can be 
evaluated in terms of the coefficients a l} alieady known and those in Qnm Z - 
This concludes the proof that E nm (u r ) is a polynomial in n, m of degree 3r and 
is divisible by nm(n + m + 1). 

We now investigate the coefficients of the terms of degree 3r. For X = 1, 2 
E n , ™(u a ) = —-—£_AJ; (nm) x (n + m + l) x -f terms of degree < 3X). 


We assume this to hold for X < r and we will show that it holds for X = r. Sub¬ 
stitution reduces the right side of (10) to 


1 ( 2r \ l i sR 2 * 1 “ 3) ••• 5-3-1 , 

JT+uU) it ”” 1 [_-- ( ”~ 

+ mn'l ——— ii-'fm - !)-'(» + m)- 1 


1) m (ft + m) 


r—1 


+ (terms of degree < 3r) 


or 


^i){(„ + mr [ gr-3)^-6.S.r 

■ [n(n — l) r-1 m r+1 + m(m — l) r_1 n r+1 ] + (terms of degree < 3r — 1) 
which reduces to 


3r(2r - 1) ... 5-3-1 
12 ' 


{nm) r (n + m) 1-1 + (terms of degree < 3r — 1). 


Comparison of coefficients with (13) multiplied by nm gives 

(2r —!)■•• 6-3-1 


3r—3 

nm 22 aoj-i-t-ni 1 m dr ~ 3 ~ i 

y-0 


12' 


(mn) r (n + m) 


r -1 


or 


( 14 ) 


E nm (u ir ) = ( --- (nmY(n + m + l) r 

+ (terms of degree < 3r). 



58 


II. B. MANN AND D. B. WHITNEY 


We now wish to show that E„ m (u ir ) is at most of degree 2r in n or m. For 
r = 1, 2 this has already been established. Assuming that it is true for lower 
moments the right side of (10), which reduces to nmlfcz* is at most of degree 
2r — 1 in n. We again compare coefficients in (12). First, for terms of degree 
3r - 3 wo have already seen that n has degree at most 2r — 2, For terms 
of degree 3r — 4 we use i + j = 3r — 3, a = i — 1 and i + j **• 3r — 4, a » t\ 
The first case gives rise to no terms in n of degree greater than 2 r — 2 so when we 
solve for the coefficients the coefficients of terms in n of rlegree greater 

than 2r — 2 must be zero. The process repeats and we find no terms in n or m 
of degree greater than 2r — 2 in the left side of (12). This gives /£„*(«") at 
most the degree 2r in n or m. 

Now consider tlie ratio 

j _ EnMl 

(2r — 1) ■ * • 5• 3• 1 , w . i i\r 
—•——:- (nm) (n + m -j- 1) 

[nm[n + m + l)/12] r 

(terms of degree, < 3r; in tt or m, < 2r) 
[nm(n -f- m + 1) 12] f 

= (2r — 1) ■ • ■ 5-3-1 4- ft° rms ^K rcc < * ^ r > H or in ' ^ 2r) 

(nm) r (n + m + 1 ) r 

Hence 

(16) Lira I - (2r - 1) ■■■ 6-3-1 

and by a well known theorem it follows from (15) that the limit distribution is 
normal. 

6. Consistency of the U test. If / and g are the cumulative distribution 
functions of the s’s and y ’s then our null hypothesis is / = g. The alternatives 
admitted are/(a) > g(a) for every a, Let E* denote the expectation under the 
alternative. 

Defining 

0 if ® f < y) 

Xij = 

1 if %< > Vt 


Ea.(x<i) = P{Xi > yj) — f g df < l 
»~00 

EA(xnx ik ) = P( Xi > Vi\X( > yk) = j g i df < \ 

• cc 

x lh ) = P(> v*, Xi > y k ) = j[ (1 - /)* dg < i 


we have 
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We can now write 

E*(xij) = h — \ EAxijXa) = i — «i, Ej.(xaXjk) = \ — e 3 

where X, «i, e 2 are positive numbers. 

We have then 

a A (%if) ~ \ X oa(x\}X{Ii) — 1 % — H - X — X 2 

<u(Xific H ) - 0 for i 5* k,j l <rAx lk z lk ) = - tJ + X. - X 2 

Now 

(16) E A (U) = Ea&u) — nm/2 — \nm 

*»/ 

and 

(17) <u(U) = 2 o* a (xij) + 2 vAxijXtk) + 2 cr A (x,iXj),) + 2 <r A (xijXii) 
or 

a\(lf) = nm(n + m + 1)/12 

+ nm[—X 2 (n + m - 1) + (X - *)(m - 1) + (X - ^)(n - 1)]. 
Let the critical region under the null hypothesis consist of those U’s satisfying 
nm/2 — U > t„<r where lim t„ - l. Then 

n-*a« 

P(nm/ 2 ~ U > t„tr\ A) = P(E A (U) - U> k-a A ) where k = tn<r ~ Xrtm 
and by Tchebycheff’s inequality, since for large values of n,m k < 0 

which by (5) and (17) gives 
P(nm/2 — f7 > <r | A) > 1 

nm(n + m + 1) nm [_^ 2 ( n + TO _ l) + (\ - ei )( OT - 1) + (X — e 2 )(w - 1) 

(t n ■>/n?ra(ra + m + 1)/12 — X?m) 3 
> 1 

12 

1 + n + ^+i r “ X2 ^ + w - 1) + (X - 6i)(w - 1) +(X - «,)(» - 1)] 

_ • 

\ r n + rrt + 1/ 

We obtain then that 

Lim P(nm/2 ~ U > t H o | A) = 1 
which is the requirement for consistency. 
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6. Comparison with other tests. Another lest which might seem appropriate 
for the comparison of a control group with a group receiving treatment is the 
test introduced by Wald and Wolfowitz [2J. The test by Wald and Wolfowitz is 
consistent with respect to every alternative g, However in the ease considered 
wc are only interested in the alternative hypothesis that measurements in the 
group receiving treatment are stochastically larger than in the control group. 
Intuitively, it seems that the. test proposed here is more efficienl for detecting the 
particular alternative considered than the test proposed by Wold and Wolfowitz. 
This intuitive feeling was borne out by the results of the test in the particular 
experiment described in the introduction. All in all, (12 experiments were 
conducted using various bacteria in different solutions and various amounts of 
the protective drug. The U Test gave 14 significant results on the. 5% level 
ancl 4 on the 1% level. The test of Wald and Wolfowitz gave 7 significant 
results on the 5% level and 2 on the 1% level. A final decision between tin* two 
tests can, of course, only be arrived at on the basis of their power functions, 
which piesent formidable difficulties. 

In comparing the two statistics it was noted that a slight dislocation of a 
value may cause a significant change in the number of runs easier than it can 
cause a significant change in the statistic, proposed here. For instance, in the 
sequence zi.tj.ta.riVoj./iiboth statistics would give a probability less than 
,05 If however, the sequence is slightly altered In .e 1 .r a r a jr^r t y t .r e j/,j/jy <i i/iyj, 
P (number of runs < 4) > ,05 while /'((' < 1) ~ .002. 

After completion of the present paper it came to the authors attention that 
the U test had already been proposed by K. Ev. Mat hen [3|. However Mathen’s 
distribution of U is incorrect and its derivation erroneous, since it assumes 
independence of the random variables x>, as defined in section 5 of (In' present, 
paper, while obviously x if and x, k are not independent. 
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ON THE CONVERGENCE OF SEQUENCES OF MOMENT 
GENERATING FUNCTIONS 

By W Kozakiewicz 

University of Saskatchewan 

1. Summary. The purpose of this paper is to give a few theorems con¬ 
cerning the reciprocal relation between the convergence of a sequence of distribu¬ 
tion functions and the convergence of the corresponding sequence of their 
moment generating functions. 

The paper consists of two parts, In the first part the univariate case is 
discussed. The content of this part is closely related to that of a recent paper 
by J. H. Curtiss [1, p. 430-433], but the results are of a somewhat more general 
nature, and the methods of proofs are different and do not make use of the theory 
of a complex variable. The second part deals with the multivariate case which, 
as far as the author knows, has not been treated before with proofs in as com¬ 
plete and rigorous a way. 

Tn both the univariate and multivariate cases the proofs are based on the well 
known Helly selection principle [2, p. 26] for bounded sequences of monotonic 
functions. 

2. The univariate case. Let X be a random variable and F(x) its distribution 
function. That is, for any real x, F(x) — P{X < *], where P{X < rj denotes 
the probability of the event X < x. The function 

y(f) = E{e tx ) = f + ” e tx dF{x), 

in which the integral is taken in the Stieltjcs-Riemann sense and is assumed to 
converge in some neighborhood of the origin, is called the moment generating 
function of X (or of F(x)). 

Henceforth we use the abbreviations d.f. and m.g f. for the terms distribution 
function and moment generating function respectively. The variable l will be 
always real. 

Theohem 1. Let { F n (x) | be a sequence of df.’s Let M(x) for any fixed 
non-negative x be the least upper bound of the sequence {P\,(— x) -f 1 — F n (x )}. 
If the sequence \F n (x )} converges on an everywhere dense set of points on the x-axis, 
and if there exists a positive number a such that for any fixed t in the interval 1 1 1 < a 

(1) lim e uu M{x) = 0, 

OO 

then: 

(a) there exists a d.f . F(x) such that lim F n (x) = F( a:) at each point of continuity of 

n-»«> 

of F(x); 

(b) the m.g.f.’s of F(x) and F n (x), say <p(i ) and exist for | 1 1 < a; 

(c) lim <f n (t) = <p(t) for | 1 1 < « and uniformly in each interval \ t \ < (3 < a. 
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To prove (a), it may be noticed that there exiatB a function Fix), noiwWreosing 
and continuous on the right, such that lim F„(z) =* F{x) at each point of eon- 

ft-* 90 

continuity of F(x ). But F(x) must be a distribution function. Indeed, we 
have for x > 0 


(2) F(-x) + 1 - F{.x) < M{x-). 

Now from (1), putting t — 0, we find that Mix) and consequently Mix — ) 
approach zero as x —> + ®, This proves that F(— *) = 0 and F(+ ») » l, 
To prove (b), we notice first that the integral 

v»(<) = f e Tl dF n (x) (n — 1, 2, ■ * •), 


is convergent for | 1 1 < a. This follows immediately from (1) by applying the 
method of integration by parts to the integrals 


[” e zl dF„(x) and f c" dF n {x), 
'0 J-y 


which for any (in the interval! 1 1 < a will be seen to lx; bounded for all values of 
N. By the same argument, the relation lim d/(z-)c !M * » 0, |ij < a, which 

*-♦+90 

can be easily deduced from (1), together with (2) imply that the integral repre¬ 
senting <p(l) is convergent for | l | < a. 

Let now j3 be a positive number less than a and let y be such that 0 < y < a. 
Let M y be the least upper bound of M(x)c rc for * > 0. Using the method of 
integration by parts and applying (1) we have for 1 1 1 < 0 


(3) 


e zl dF „(*) = [1 - F n (N)} e Nt + f f e*'[l - F„(x)) dx 


£ M{N)c sl> + Myfi 

y - d 


We could prove easily that the same inequality is true for the integrals 
I* ** dF n (x), J+ e tl dF(x), j_ " e" dF{x). 


Now let € be any positive number. Because of (3), we have 
W f e*‘dF„ix) < [ e“dF(z) < e , 

J I*I>JV 0 d«i>r, ' 

for a sufficiently great number No , and uniformly with respect to n and l, when 

1 M < d- Clearly, N 0 can be so chosen that F(x) is continuous for x =* db N », 
Then 


(5) lim 

n—*» 

uniformly for | i | < 0, 



n xt dF (*), 



MOMENT GENERATING FUNCTIONS 


63 


The relations (4) and (5) prove that <?„(£) —» <p(t) as n —> <*>, uniformly for 
£ 1 < j3, But /S can be chosen as near to a as we, please; thus (c) is proved. 
Theorem 2. Let (F„(t)) be a sequence of d.f.’s and \<p n (f )) the corresponding 
sequence of m.gf.’s. If exists for 1 1 \ < a, and if there exists a finite valued 
’unction <p{i) defined for | £ | < a, such that lim p„(£) = p(£) for | t j < a, then 


(a) lim M(*)e l<lz =0/or|£( < a; 

(b) there exists a d,f. F{%) such that lim F„(x) — F{x) at each point of cpntinuily 

n-tco 

of F{x) 

(c) the m.g.f. of F{x) exists for | £ | < a and is identically equal to <p(t) in this interval. 

(d) lim <p n (t) = </>(t) uniformly in each interval | 1 1 < /3 < a. 

n->oo 

To prove (a), let t be a number in the interval 1 1 \ < a, and let /3 be chosen so 
that 1 1 1 < j8 < a. Then, for x > 0, we have 

F n (-x) + 1 - F n {x) = f * dF n (u) + / + ” dF n (u) 

J— eo J x 

< e~ Px e~ fiv dF n (u) + e + f + °° e fiu dF n (u) 

—00 i »z 

< e ~^ x [<p n {—0) + V M). 

Consequently 

M(x)c u ' x < l.u.b. M-p) + <pM], 

n 

and since the sequences —/3)} and \<p n (P )} are convergent, and therefore 
bounded, it follows that M{x)e' <l,x approaches zero as x —* + 05 ■ 

To prove (b) we may notice that by the Iielly selection principle we can 
choose a subsequence { F„ k (x)} which is convergent to some non-decreasing 
function F(x), at each point of continuity of F(x) Now the Theorem 1 together 
with (a) imply that F(x) is a d.f. and that the limit of the subsequence {<p„ k CO}, 
namely must be identical, for | £ | < a, with the m.g.f. of Fix). By the 
uniqueness property of a m g.f. we know that F(x) is uniquely determined by 
<p(t), and therefore it follows that every convergent subsequence of {F n (x )} 
approaches the same limit F(x) at each point of continuity of F(x). This is, 
however, equivalent to the statement that the sequence (F n (:c)) itself converges 
to Fix) at each point of continuity of Fix'). Thus (b) is proved. We see at 
once that (c) and (d) follow immediately from the Theorem 1. 

Theorem 2 is of course similar to the Theorem 3 in the paper of Curtiss [1, 
p. 432], The proof of (a), however, is not contained in his paper. From the 
Theorems 1 and 2 there follows immediately 
Theorem 3. Let { FJx) ] he a seauence of d.f.’s, and lei ! v„(t) 1 be the correspond- 
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ing sequence of m.g.f.’s, which arc all assumed to exist for ; /1 < or. 7'he necessary 

and sufficient conditions for the convergence of (<p„(0| m the interval (t j < a, are: 

(a) lim M(x)e uit =0, | t | < « 

(b) the sequence \F n (x )) converges to a d.f. F{x) at each -point of continuity of 
Further , the m.g.f. of Fix) exists for |i| < a and is equal in this interval to the limit 
of the sequence {<^(1) j. 

In his paper Curtiss gives an example of a sequence [/'’„(*) | of d.f.’s which 
converges to a d.f. Fix), while the corresponding sequence \y> n {0 1 of m.g.f.’s does 
not converge to the m.g.f. <p{t) of the d.f. Fix), though both </> n U), (n « 1, 2, ■ • •), 
and </i(l) exist for all t. It may be easily proved by the direct method that in the 
case considered the condition (a) of the Theorem 3 is not satisfied. 

It is perhaps worth while to notice that the condition (a) of the Theorem 3 may 
be expressed also as follows: 

lim x~ l log M(x) < —a, 

l-r-feo 

3. The multivariate case. For the sake of simplicity we shall consider here 
the bivariate case only. The results obtained in this chapter, can be, however, 
easily extended to the case when d.f.'a and m.g.f.’s are defined in the Euclidean 
space of any finite number of dimensions. 

Let (Xi, Xf) bo a fandom vector variable in the two-dimensional Euclidean 
space, and let, Fix,, x-f) be its d.f. That is, for any real numbers n and Xi , 

Fixi , xf) = Pj AL < X), X% < xfi, 

Let 


Fi(x ,) - P{X> < aaj = Fix ,, + -+), 

Ftixt) = P(^fj < %} = /*’(+ «j, xt) ] 

then F,{xi) and F^xf) are called the marginal d.f.’s of X, and X* respectively. 
The m.g.f,’s of the d.f.’s Fix i, .T 2 ), F,(xi) and Fj(xa) are defined by the equations: 

<pih, fa) = Eie x ^ +x ^) = f + “ P dFixuxf) 

1/—• oq J—as 

vdQ) = Eie Xtt< ) = j P dF,ix,\ (i » 1, 2), 

in which the integrals arc assumed to converge in some neighborhood of the 
origin. It is easy to sec that vffk) => y(fa , 0) and ^(fa) = y,(0, £»). 

ThuomdM 4 . Lei (c(fa, 4) and ^(q, q) he the m.gf.’s of dj.'s Fix„ xf) and 
F U, xf) respectively. If p(q , 4) and <p*ih , 4) exist and are equal in sonic 
neighborhood of the origin | 4 ( < a,, (i = I, 2 ), then Fix ,, xf) = F*(xi, xf) 
identically. 

To prove this theorem, let us introduce two random vector variables ( Xi, Xf) 
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and (X* , X*) of which the d f .’s are respectively F and F*. Consider now two 
random variables 

Z = Xxk + Xik , Z* = Xtk + Xtk , 

where 4 and 4 denote two real numbers not both zero. If <p(t) and <p*(i) are 
respectively the m.g.f.’s of Z and Z*, we have 

v(t) = v(Ux, tk), v*d) = v*dk , Ik). 

Consequently <p(t) = <p*(L) provided that \tt,\ < a ,, (i = 1,2). It follows from 
the uniqueness property of the m.g.f. in the univariate case that the d.f.’s of 
Z and Z* must be identical. Now, according to a theorem due to Cramfir 
[3, p 105], if the d.f.’s of Z and Z* coincide for all pairs of values (4,4) such that 
I 4 1 + | 4 | 0, the d.f.’s F and F* must be identical. It may be worth while to 

reproduce here Cramdr’s proof. Let ^(4, 4) = E(e' {Xlt ' +x - h) ) and ^*(4,4) = 
be the characteristic functions of F and F* respectively. 
Then ip(tti , tk) and i p*(tk , tk) are the characteristic functions of Z and Z* 
respectively. Since Z and Z* have the same d.f.’s, it follows that Ik) = 

, tk) for all values of t. Putting i — 1, we find that 4>{h , 4) = ^*(4 , 4) 
if 141 + | 41 5^ 0. For 4 = 4 = 0, ^(0, 0) = ^*(0, 0) = 1. Therefore ^(4,4) = 

, 4) identically, and since the characteristic function uniquely determines 
the d.L, it follows that the d.f. F and F* are identical. 

Theorem 5. Let , x 2 )} be a sequence of d.f.’s. Let Fi n (xi) and F in ixf) 

be respectively the marginal d.f. ’s determined by F„ (xj, x 2 ). Let 

Mfxi) = l.u.b. \F w (-x t ) + 1 - FM} 

n 

1, 2). If there exist positive numbers at and a 2 such that for 

lim •M,(x»)<? u,lx * = 0, (i = 1, 2), 

*,-* + 00 

and if {F„(xi, x 2 )} converges on an everywhere dense set on the (xi, x 2 ) plane, 
then: 

(a) there exists a d.f. F(x i, x 2 ) such that lim F n (x i, x 2 ) = Fix i, x 2 ) at each point 

|l“400 

of continuity of F(x i, x 2 ), 

(b) there exist two positive numbers 5t and S 1 ,S x <at, such that the m.g.f.’s of 
F(x t, x 2 ) and F iv (x i, x 2 ), say y(4> 4) and <p n (4, 4), exist for | 4 | < fii, (i = 1, 2), 

(c) lim Vnih , 4) = <p(h , 4) for | 4 | < 5,, and uniformly m each two-dimensional 

n—>oo 

interval | 4 | < < 5 <, (i — 1, 2). 

To prove (a), we notice that there obviously exists a function F{xi , x 2 ), con- 
tinous on the right with respect to each variable, satisfying the relation 

A 2 F(xi , x 2 ) — F(x" , x 2 ) + F(x[ , x'i) — Fix{ , xf) — Fix" , x 2 ) > 0 
for %[ < %\ , x 2 < %<!, , and such that 


where x, > 0, (i = 

| 4 | < oti 

(G) 
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(7) lim F„(x u *,) = Ffa.xi) 

n-*so 

at each point of continuity of F(x,, xj. We shall prove that F(x,, x*) is a d.f. 
In fact, it is easy to see that we have for as, > 0, (i = 1, 2), 

(8) F(-Xi, -X*) < F(~x i , ®j) < F(xi, ~x 2 ) < ATjfe-), 

1 - F(x i, xj) < ilfi(xi) + M t (x a ). 


Now, according to (6), lim = lim MAx t ) = 0, (» » 1, 2), therefore it 

follows from (8) that F(-«>, -«=) = ®«) “ F(*i, -«) « 0 and 

F(+ oo, + co ) = 1, which proves that F(x i, Xi) is a d.f. 

To prove (b), let <p ln (l<) be the m.g.f, of the d.f, F, n (xi), (i => 1, 2). Let 
Fi(xi) and ftfe) be the naarginal d.f.’s determined by F(x : , .r s ) and lot <pt(l r ) be 
the m.g f. of F t (x t ), (i ~ 1,2), 

Now let N' > N > 0 and 

R n (N, N', i u k) - j[* fl c z ' lliXi,, dF n (x u x,) ~ l" l* dF n ( Xl , *,) 


-//+//+/ 

a,v J-aa' a~aa Jaa A-aa' j-/a 


+ 


r r y c* i,i+ *' , >dF n (xi,z ) ) 

J-V' J~/A' 


/i -(- Jj 4* + L. 


Applying the Schwartz inequality to I\, wo find 



But 

do) r r < r e^dFM, 

J—N f 

and similarly 

(id r r e 2 ** 1 * dF n (xi,».) < r ^ 

Let e be any positive number and yj a positive number less than a,, (i — 1,2), 
It follows from the proof of the Theorem 1, taking into account (6), that the 
integrals representing and (i — 1, 2), exist and are uniformly con« 

vergent with respect to n and U , when ! h | < yi, (f - 1, 2), Consequently 
we have 

(12) [ e Xi, dF in (xi) < e, f e"* 1 *dF{(X{) < «, (i - 1,2), 

uniformly with respect to n and U when 1 1 \ < y,-, (i = 1, 2), provided that N is 
sufficiently large, say N > No . Let us take j3 ( = y,-/2, [i = 1,2). The integrals 
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representing <p in (U) and <p,(U), (■i = 1, 2), are obviously uniformly bounded for all 
n and when | t t | < yi, (i = 1,2), they are all less than some constant C. Con¬ 
sequently taking into account (9), (10), (11), and (12), we find 

h < ■%/ Ce, 

uniformly with respect to n and U when | U | < /3,, (i — 1, 2), provided that 
N' > N > No . Since the same inequality is true for I 2 , Is and L \, we have 

(13) R n (N, N', k,U) < WCt, 

uniformly with respect to n and t ,, when | U | < Pi , (* = 1, 2), provided N' > 
N > No . Hence the integral representing <p„(ti, U) is uniformly convergent for 
| t, [ < Pi, and consequently convergent for | U | < ce</2, (i = 1, 2), since /3, 
can be chosen as near to a f /2 as we please. 

Similarly, using (12), we could find 

(14) R(N, N\ h,k) < 4 Vcl, | <, | < 0,, N* > N > N« 

where 

f N [ N e x ' ,l+Xih dF(x u x») - [* f" dFfa, * 2 ). 

J - N ' J-JV J-tf 

This proves, in turn, that the integral representing <p(ti, fc) is uniformly con¬ 
vergent for | t, | < Pi and convergent for | <i | < a,/2, (i = 1, 2). Thus (b) is 
proved with 5,- = a,/2, (i = 1, 2). 

To prove (c), let N' —» + «> and N = No in (13) and (14). We obtain 

(15) R n {No ,4" 1:0 , ti , 4) < 4 -\/Ce, R{No, + w i ti, 4) < 4-y / Cf 
uniformly with respect to n and t, when 1 t, \ < 0,. 

Clearly, N 0 can be chosen so that Fi(xi) and F t (x 2 ) are continuous for Xi = 
x 2 = rfciVo. Then 

n N o t* #0 I* Nq * N o 

(1G) lim / e* lll+x ' li dF n (x 1 ,x 2 ) = / e Il ‘ l+a, . < ‘dfi’(a: 1) a; 2 ), 

uniformly for 14 | < /3», (i = 1, 2). 

The relations (15) and (16) prove that 

lim <p n (ti, 4) = <p(h, t 2 ), 

n-*oa 

uniformly for | t { | < 0,, (t = 1, 2). The ordinary convergence obviously holds 
for | U j < a,/2, (i = 1, 2). 

It follows from the above proof, wliich refers to the bivariate case, that we 
may take 5, — ou/2, (i = 1, 2), in (b) and (c). 

The existence of the corresponding numbers , 5, < a ,, (t = 1, 2, • • • , 7c), 
in the /c-variate case can be easily established by the repeated application of the 
Schwartz inequality. 
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Theorem 0. Let <p,(h , tf, <e,„(4), KUi, to), KM and (j = 1, 2), 

have the same meaning as in the Theorem 5, If p„(4, 4) exist for j t, j < at , 

(i i = l, 2), and if there exists a finite valued function ip(4 , 4) defined for ! /, j < «, , 
such that lim tpnOi , 4) = <p(h , 4), | 4 | < «;, 

then 

(a) lim Zl/,(.r,) c' 1 '' 1 ’ - () for 1 1, | < a,, U * 1, 2), 

(b) there exists a d.f. F{xi , r-*), «u<7i i/iai lim F„(.ri, Xj) = F (In, xB at each point of 

n~*cn 

continuity of F(xi, xf, 

(c) the m.g.f. of F{xi, xf exists for 11,-| < a, and is identically equal lo e>(4 , tf for 

1 1, | < at , (t = 1, 2), 

(d) lim <p„[k , if) = , if uniformly for | ti | < Pi < a ,, (i = 1, 2). 

1t-*o6 

To prove (a), it is sufficient to notice that pi*(4) *» ip„(h , 0) and ^(4) ~ 
p„(0, if. Consequently we have 

lim ip in {tf = <p{ti , 0), lim ^,(4) = ie(0, tf, | 4 | < a/, (* = 1, 2). 

n—♦ to n ‘••eO 

Therefore (a) follows immediately from Theorem 2. 

To prove (b), we may notice that according to the Holly principle of selection 
applied to the sequence {Z'' n (Xi, xf j, there exists a subsequence [F„ k (:ci, xf ), 
selected from the sequence {F„(xi , xf j which is convergent to some function 
F[x i, xf continuous on the right and with non-negativo second difference. 
But F(x i, xf must be a d.f. according to the Theorem 5, since the relation (0) is 
satisfied by the sequence {F nk (x i, xf j, Moreover, the limit of the sequence 
,4)), namely <p(k , 4), when considered in a sufficiently small neighborhood 
of the origin, is the m.g.f. of F(x i, xf. Since the d.f. is uniquely determined by 
its m.g.f,, it follows that every convergent Bubsequcnco of (T„(xl , xf j con¬ 
verges to the same limit F{xi , xj) at each point of continuity of F(xi, xf , This 
is, however, the same as to say that the sequence {F n {Xi, xf } itself converges to 
F(x i, xf at each point of continuity of F(xi , xf. 

To prove (c), we have to show that the m.g.f, of F(x i, x 3 ), say p*(4 , tf , exists 
for 11,* | < «, andisequalto v?(4,4), 141 = 1,2). (We have proved that 

, 4) = <p{k, if only for sufficiently small values of | 4 | and | 4 [). The 
existence of ip*( 4, tf for \l<\ < a tl (i = 1, 2), can bo easily established by the 
method used by Curtiss {1, p. 433], Suppose indeed that <p*(h , tf does not 
exist at some point (ij, 4), whore | 4 | < , (i = 1, 2). That means that we 

can find a positive number N such that 

(17) f (” e , ‘ IlHil, dFx i ,x i )><p(H,t\) 

Since lim F „(»,, xf = F(x \, xf at all points of continuity of F{x i, xf, and since 

71 —fOQ 
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N can be so chosen that the marginal di.’s Fi(x{) and aie continuous for 
x\ = Xi = ±N, it follows that 

(18) lim f N [" e^W'dFnfaa*) = [" [" e‘° x,H “ Xl dF(z L , u). 

n->oo v—lf v—N *j—N 

The formulas (17) and (18) give lim <?„(/“, <£) > <p(il , f§), which is impossible 

n-*» 

because lim <p n (4, 4) = <p(k , 4) for | 4 | < a,, [% = 1,2), 

n-*w 

To prove that <p{h, k) = <p*(k, 4) for | 4 \ < at, [i = 1,2), let (4, k) denote 
a fixed point such that \u \ < a,, (i = 1,2). Clearly, <p n (lk , tt a), (n = 1,2, • ■ •), 
and <p*(lh , 14), considered as functions of the variable t, are m.g.f.’s provided that 
| 14 | < a,, (i = 1,2). (See first part of proof of Theorem 4), Now, according 
to Theorem 2, the limit of the sequence \<p n {lh , 14)}, namely ip{ih , Ik), | iU | < «,, 

[i - 1, 2), is also a m g.f Since <p{lk , tk) = <p*(tk , Ik) m a sufficiently small 

interval containing the point t = 0, it follows from the uniqueness property of the 
m g.f. in the univariate case that <p(tti , 14) = v*{tk , tk) identically for \lk\< on, 

{i = 1, 2). Putting t = 1, we find <p{k, k) - p*(4 , k), | 4 | <a,,(i - 1, 2), 

Thus (c) is completely proved. 

To prove (d), it is sufficient to notice that the sequence }p n (4,4)} is uniformly 
continuous in each two-dimensional interval | 4 | < /?» < a», (» = 1, 2), (that 
is, for any « > 0, there exists a positive number 5 = 5(e) such that 

] <Pn(k , 4) _ <Pn(k , 4) | < t 
if 

| < - a I < 5,1 1[ I < ft, I C ( < = 1,2), (ft = 1, 2, ■*•)). 

Consequently, the sequence [<p n { 4,4) j which is convergent for | 4 j < j9,-, must 
be uniformly convergent if | U | < , (i = 1, 2). 
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A GENERALIZATION OF TSHEBYSHEV’S INEQUALITY TO TWO 

DIMENSIONS 

By Z. W. Bihneaum, J. Raymond, and II. S. Zuckerman 
University of Washington 

1. Let Xi, X 2 , ■ • • , X„ be independent random variables with expectations 
E(Xj) = ej and variances <j\X,) = a) for j = 1,2, < * • , n. The question 

may he asked: What is the upper bound for the probability 

that the point (A’j , Z 2 , • • • , X n ) does not fall inside of the ellipsoid 

. I? 

i=l tf 

For n = 1 the answer to this question is given by Tshebyshev’s inequality 


( 1 . 1 ) 


T (X - E(x)) 

L v 


* ■» o\X) 


i 3 


which can not be improved without further assumptions. By a trivial generali¬ 
zation of the argument leading to (1.1) one can prove the inequality 


( 1 . 2 ) 


j. A < 

\)-l h ) ,-i tj 


for any integer n. This inequality, however, can be improved for n > 2, In 
particular, for n - 2, the following theorem will be proved: 

Theorem 1,1, Let X and Y he independent random variables, with expectations 
E(X) = X 0 , L(Y) = y 0 and variances , oy ■ Then, for any a > 0, L > 0 


s 

Ox 


such that — < ~ we have 

o t 4 


(1.3) 

where 


(X - Xof (Y - Yo) J ^ , 

' jo — 


]<- 


L{s, i ) 


(1.4) L(s, i ) = 


1 


2 2 2 

Vi I Gy __ Ox 

■ JO -0 


s 3 


/ 2 2 \ 

(a x , ay \ 

Lkli). it 


1 - 


3 

ax 


2 2 2 2 
<rr i a r ffycrr 

~S t 2 W 


+ f + 4/^?) 

</i(¥ + T + 4/|Tf)<x. 
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2 2 

For any given a x , a\ , s > 0, t > 0 such that ^ ^ there exist independent random 

S v 

variables X and Y with the variances a-* , , smc/i tfiai ihe equality sign is true in 

a.s). 

This theorem is a special case of the more general statement: 

Theorem 1.2. Let W, Z be independent random variables such that 


(1.4) 

(1.5) 

( 1 . 6 ) 


P(TF < 0) = P(Z < 0) = 0, 
E(W) = x, E(Z) = M, 

X < fi. 


Then, for any t > 0, we have 


(1-7) 

where 


(1.8) M(«) = 


P(1F + Z > i) < M(i) 

1 if < < X + m 

X-h/r X i ~' (X + m) /r 


< 


t i - X / - X 

if X + m ^ ^ i(x + 2ju + s/y? ■+ iy. 2 ) 


^ if i(X + + Vx 2 + 4 m 2 ) < t 

For any given X > 0, p > 0, X < p, and t > 0, there exist independent variables 
W, Z such that (1. 4 ) and ( / .6) are fulfilled and that the equality sign is true in (1.7). 
Theorem 1.1 is obtained from Theorem 1.2 by writing 


W = 


(X - X«) 2 


Z = 


(X - y „) 2 

t 2 


t = 1 . 


2. Before proving Theorem 1.2 we shall derive two lemmas. The first of these 
lemmas deals with more than one variable. Since its proof for general m does not 
present any additional difficulties it will be stated and proven for any number 
to > 1 of valuables, although in the proof of Theorem 1.2 it will be used only 
for m = 1. 

Lemma 1. Let U, V\, V 2 , ■ ■ • , V m be independent discrete random variables 
with only non-negative possible values, and lei U have a probability distribution 
With the possible values 0 < Ui < Ut < • • • < U ti and the probabilities P(Ui) — r< 
for i = 1,2, • • • ,n. We consider any three possible values Uj , f7/ e , XJiof XJ such 
that 

0 < Uj < Vk < Ut, 

with the corresponding probabilities r,, r* , ri. Then, for any t > 0, there exists a 

random variable W with the same distribution as TJ except that the probabilities 
r/ f ft, ri of Uj, Uh, Ui are replaced by r\ ,r' k , r\ such that 
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(2.1) E(U') = E(U) 

(2.2) one of r,, t > , is zero 

(2.3) P((7' + V! -+- • • • + V m > L) > P(U + V, + ■ ■ • + V„ > l). 

Peoop: let r',, n , r[ be written 

(2.4) r'j = r, + n = n — /3, U =* rj + (1 — a)#. 

For any a, /3 we then have 

rj + r' k + r[ - r, + r k + n . 

Choosing 

(2.5) «=([/,- ChO/O/, - V,) 
we obtain the equality 

U,r, + UA + U/, = Up, + f/rv* + Uir, 

so that (2.1) is time for any f). 

We obviously have 

p(v + ’Ey. > t) - £ p(u - p,)-p( z v. > i - ^ 

V, ,~i / .-1 V-i / 

^ /,. s 

- gr,P(ZF, >i-U t y 

The variable V has the same possible values (/< m the variable U. Writing 
P(U’ = U<) — r\, for i — 1,2, • • • , n, wo also have 

(2.7) P (u' + g V. > ij « g r',P (g V. > < ~ £/.■) ■ 

From (2.0), (2.7), and (2.4) we obtain 

p(p' + g^>i)-p(p + LF,>() 

(2.8) = ctfiP (jt V, > l - U^j - /3P (g V, > l - Ukj 

+ a - cOfip (g r, > i - f,) , 

For a determined by (2.5), tho right-hand side of (2.8) is of the form (7/3, and 
will be positive if sign /3 = sign C. If sign C is positive, wo choose 0 ■» r* and 
have, from (2.4), t% = 0, and, from (2.8), tho inequality (2.3). If sign C is 

negative, we set (3 = Max which leads to either r, = 0 or rj *> 

0, and again to (2.3). In both cases we have kept the probabilities t ,, r*, r\ 
non-negative as they should be. 
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Lemma 2. Let the discrete random variable U have only the two non-negative 
values U\ < U 2 , with the corresponding probabilities r\ , ri , and let t be a given 
number such that 

(2.9) E(U) < t < XJi. 

Then there exists a number 01 > 0 such that the random variable JJ' with the possible 
values 

(2.91) Ui = Vi + a 

U 2 = t 

and the corresponding probabilities n , r 2 , has the properties 

(2.92) 0 < Vi < V 2 

(2.93) E(U‘) = E(U). 

Proof: to have (2.91) and (2.93) it is sufficient to choose 

TuiUi - t) 

CL = - 

ri 

Then (2.92) is also fulfilled since, m view of (2 9), we have 

XJ[ = n Ui + nUt— nl = E(U) - ?a t < t - r 2 i = t = jj/ 

ri ~~ ri ’ 

and obviously a > 0 and hence TJ[ > Ui > 0. 

3. Theorem 1 will first be proven under the assumption that W and Z are 
discrete random variables, each with a finite number of non-negative possible 
values. By repeatedly applying Lemma 1 with m = 1, 17 = W, Vi = Z, we 
reduce the number of possible values of W which have non-zero probabilities 
to two, and denote those possible values by Wi < W 2 , and their probabilities 
by pi and p 2 = 1 — pi. Then, applying Lemma 1 to the case m — 1, U = Z, 
Vi = W, we similarly reduce the possible values of Z to the two non-negative 
values Z\ < Z 5 , and denote the corresponding probabilities by qi and g 2 = 1 — g*. 
Throughout all these steps the expectations E(W ) = X and E{Z) = u remain 
unchanged, and P(W + Z > t) is not decreased. 

For t < X 4- p, inequality (1.3) is obviously true, and equality is attained for 
W having the only possible value X with probability 1 and Z having the only 
possible value u with probability 1. 

For the remainder of the proof we assume i > X + g. We then have 

i>X + g>X + Zi>T ; Fi + Zi. 

If W? > t, we may replace it by W 2 = t according to Lemma 2. Similarly, if 
Zi > t, we may replace it by Z 2 = t. The probability P{W + Z > t) is not 
decreased m this process. We may thus assume, without loss of generality, that 

W 2 ^ tj ^2 ^ it 
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The joint distribution of (IF, Z ) has now the possible values represented by the 
four points (TIT, Hi), (TIT, Z 2 ), (ITT , Hi), (ITT , H s ). The coordinates of those 
four points and their probabilities fulfill the following conditions 

(3.1) 0 < ITT < A < WT < t; 0 < if, < m < Z» < t 

(3.2) pi + pa = qi + qz - 1 

(3.3) piWi + pzWz = A, q\Zi -}- qiZt — n. 

In view of (3.1), the point (TIT, HO always lies below tlic line IF Z — l. The 
other points may or may not lie below that line. Accordingly, wc distinguish 
the cases listed in Table I. These clearly include all possible cases since (TIT , Z t ) 
can not be below the line W + Z =* t without all the other points being below 
that line. 

In case V we have P{W + Z > t) — 0, 

For the discussion of the remaining cases we note the following relationships 
which follow from (3.2) and (3.3). 



TABLE I 


Cusa 

Points below lioe 

W + Z»t 

Point* not bolow line 

W + Z w t 

I 

i 

| 

i 

! 

0 V,,Zi), (Wi, Zi), OF,, Z,) 
(Wi.Zt), (Wi, Zi) 

II 

(Wi,Zi), (W,,Z,) 

III 

(Wi.Zi), (Wi,Z a ) 

(Wi.Z,), (Wi , Zi) 

IV 

(Wi.Zi), (Wi, Zi), (Wi.Zi) 

(Wi , Zi) 

V 

(WuZi), (Wi.Z,), (Wi, Z t ), {.Wi.Zi) 

none 


Pi 


Wz — A 

TTT - Wx 


_ _ A - Wi 

Va Wz — Wi' 


Si 


Zi — p 
Zz — Zi‘ 


22 = 


M ~ Z\ 
Zi — Z\ 


In case I we have ' 

(3.41) W\ + Zi < t, Wi + Zi> t, Wi + Zi> t, Wz + Zz> t, 
P = P{W + Z > t) = p a gi + pi g 2 + p 2 qi = 1 - piqi 


_ , _ Wz — A Z t — fi 
Wi - Wi ’ Zi ~ Zi • 

Since P is a decreasing function of ITT and Zi, we replace W\ and Hi by the 
smallest values compatible with (3.41), namely Wi => i — H a , Z\ ■=> t — W 2 , 
and obtain 


P < 1 — ~ ~~ pj 

(Wz + Zz~ ty 


R(Wz, Zz). 
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For fixed Z 2 , R(W 2 , Z 2 ) haa a minimum at TF 2 = Z 2 + 2X — t and no other 
extremum, hence it assumes its maximum at one or both of the end-points of the 
interval for W 2 which, by (3.1) and (3.41), is 

t - Zt < W 2 < t. 

In view of (3.1) we also have t — n < t — Zx , and hence 
P < Max [R{t — n, Z 2 ), R(t, Z 2 )]. 

We find 


and 


r(jl Mi Zi) = 1 — i— £ —l < 1 - * — a — * 

Za — pL t — M 



R(t , Z 2 ) = 1 - ( ~ -i-= R W (Z 2 ). 

z 2 

This last expression has a minimum for Z 2 = 2 and no other extremum, hence it 
assumes its maximum at the ends of the interval for Z 2 which, by (3.41) and 
(3.1), is 


t - Wx < Z 2 <t. 

From (3.1) we also have t — X < t — Wx and thus 

R(t, Wa) < Max [R w (l - X), Z2 a, «)j - Max - ^1 . 

L f — A t , V J 

Finally, we obtain 


P < Max 


Each of the values P = 


X 

J - m ’ 


X + n 


t - X’ t 

n x -f- a Xp 


_ X/i 

¥j' 


« — m’ t — x ’ t 

as is shown by the probability distributions 


can be attained in case I, 


Wx — 0, W 2 — t — n, Zx — n, Za — t., 


(3.42) 


(3.43) 


(3.44) 


Vi = 1 - 


, p 2 = -, 

t ~ ' fX t pL 

Wx = X, Wa ~ t, Z\ = 0, Z 2 — t — X, 


qx = 1, q a = 0; 


1 , 

Vi = 

0, qx = 


Wi 

= 0 , 

5 

11 

jr*. 

Zx = 0 ; 


X 

X 


= 1 ' 

t ’ 

PS = J, 

3i = 1 


5s 


M 


£ - X’ 


* ’ 


2s 


r 
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In case II we have 

W i 4~ Z\ < t, Wi -)- Zi < t, I Vi 4~ Ui i. h IFs + > f, 

(3,51) „ _ 

P = P(JF + Z > t) = Ms + Ms = = f r —~- 

This is a decreasing function of Z x as well as of Zi and hence takes its maximum 
for the smallest values of Z x and Z% compatible with (3.1) and (3.5), that is for 
Z x = 0, Zi = t — X. Wc thus obtain 

P < 


t - X‘ 

This upper bound can be attained in case II, as may be seen from the distribution 
Wj X, TF3 — X, Z\ ~ 0, Zi ss ( — x, 


(3.52) 


Pi = 1 . Pi = i, qi- i 


(?2 ~ 


t - X’ t — X 

Case III is symmetrical with caso II and leads to the inequality 

X 


P < 


t 


In case IV we have 

TFj 4- 2% < t, Wi -\- Zi < i, W\ -|- < {, Wj 4* Zi > 1, 

p - rffr + z > i) = „» - ArnKfe -W 


(TT, - 1F0(^ - Zi)' 

The right hand side is a decreasing function of each of the variables 114 , 114 , 
Zi, Zi, and henco is increased by cliosing for these variables the smallest values 
compatible with (3.61), i.e. 

(3.62) Wi = Zi « 0, W 2 4- Z» = t 

for which we obtain 

X L 


P < 


Wit -Wi 




Since P <2) (W!) has a minimum at Wi = ~ and no other extremum, it attains its 

largest value at one of the end points of the interval for If* which, by (3.1), 
(3.61) and (3.62), is 

X < IF* < i - p. 

This leads to 

X 


P < Max LR®(X), P,®(Z - *)] = Max 


A 

f- X’ 


i - MJ 
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The upper bounds — — , —-— , respectively, are attained in case IV for the 

t ~ A t — fJL 

probability distribution 


(3.63) 


and 


Wi — 0, Wt — X, Z\ — 0, Z<l =■ t — X, 

IVi = 0, IF 2 = t — ju, Zi = 0, Zt — n, 


Vi = 1 - 


t - m’ 


Pa = 


t — n 


q\ = 0, = 1. 


From the preceding discussion we conclude that P = P(TV + Z > J) always 
fulfills the inequality 


P < Max 


X + fj. X/i 


Li — n ’ f — X ’ i i 2 
e 1 

t > X + ji, and therefore 


= U(t) 

X 


for t > X + M. Since we have "assumed X < /j, wc have r-^— < r —-- for 


Z7(0 = Max [ 
It is easily verified that 


L< - X’ 


X + fl \n 

~ t IF 


t — /i t — x 
for i > X + n- 


t~~~\ — - - y - — y f° r x 4- /i < i < i (x + 2m + t/x 2 + 4 ^i 2 ) 


and 

- y for i(X + 2m + x/AHV) < t 

so that we have 17(i) = M(i) as defined in (1.8). For given X, n and any i > X 

+ n, the equality P = is fulfilled for the distributions (3.43), (3 52) and 

t — A 

(3.G3), while the equality P = is true for the distribution (3.44). 

t l 

This completes the proof of Theorem 1.2 for discrete random variables. If 
W and Z are independent random variables with the cumulative probability 
functions P(W < w) = F(w) and P(Z < z) = G(s), then each of these cumulative 
probability functions can be uniformly approximated by a step function with a 
finite number of steps, that is by the cumulative probability function of a discrete 
random variable with a finite number of possible values. Since for such valuables 
Theorem 1.2 is proven, it also is true for the general random variables W and Z, 
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4. An attempt to extend the method used in proving Theorem 1.2 to more 
than two variables leads to arguments of a prohibitive length. It is possible, 
however, to obtain corollaries of Theorems 1.1 and 1.2 which lead to an improve¬ 
ment of inequality (1.2) for n variables. 

Cobollaey 2 1 Let Xi , Xu , • • ■ , X r , be independent random variables with 
expectations E(X 3 ) = ey and variances <t'\X,) ~ a). Then , for any t s > 0, 
j = 1,2, • ■ • , ?i, and any m such that 


2r = « < Z 


t\ 


j**m +i 


we have the inequality 


ft « 




»i 


2r + 2j 


P 


I s 


> l)^ 


<-i t) 



_ v f — (Si + 2s) 

“* < - 2 t 

if 2! + 2 a <<< H2i + 2S s + Vif +~4sl] 
- 2 r 2, if J [2: + 22j + VVlT^l < t 


This corollary is a special case of the following corollary to Theorem 1.2 
Corollary 2.2. Lei TFi, Tfi, • • • , IF,, he independent random variables 
such that P(Wj < 0) = Ofor j = 1, 2, • • • , n, and let m be any integer such that 

m » 

2 E(Wj) = X, £ JilOFy) ~ p, X < M. 


Then, for any i > 0, toe hare 

p (g ir, > <) s MW 

where M(t) is defined ly (1 8), 

This corollary follows immediately from Theorem 1.2 by writing 

m n 

w = ZWi, z « £ Wj. 

7—1 7—m*fl 

To obtain Corollary 2,1, one only has to write in Corollary 2.2 

Wj = ey) - . 

If some additional assumptions are made on the expectations E(Wj) or on the 
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variances a], the upper bounds in Corollaries 2.1 and 2.2 may be minimized by 
proper choice of m or of the t,. For example, if all the variances are equal 

2 2 2 2 

Cl = 0"2 = • ‘ * = <T„ = (T 


and n is even, one obtains the inequality 

1 if 2 s < n<j 2 




1 - 


t — na“ 
n 
2 


if ?i<r 2 < t 2 < 


3 4- V5 


na 


no 2 1 noA ., 

ItI 1 -?-?) '< 


3 + V5„ 2 x ^2 
-;- n<J < t ■ 





DISTRIBUTION OF THE SERIAL CORRELATION COEFFICIENT 
IN A CIRCULARLY CORRELATED UNIVERSE 1 


By R. B. Leipnik 

Cowles Commission for Research m Economics 


1. Summary, It is desired to find an approximate distribution of simple 

XlXl + • • • + X T Zl . , , , 

— — j— (r jh an estimate of the serml eorre- 

l %T 


form for the statistic f = 


xi + 


lation coefficient p in a circular universe) in the case that p 0 in the universe. 
Such a distribution is obtained by smoothing the joint characteristic function 
of the numerator and denominator of the expression for f. The first two mo¬ 
ments are calculated; from these f is seen to be a consistent estimate of p. A 
graph of this distribution for sample size T — 20 and various values of p is given. 

In addition, an approximate distribution for j) ~ x'i + • ■ • + x% is derived 
which reduces to the exact (x 3 -) distribution if p = 0. From a formula which 
yields all moments, it is concluded that, at least up to the degree of approxima¬ 
tion attained, p/2 1 is an unbiased and consistent extimato of <r 2 . 


2. Several writers have investigated the temporally homogeneous stochastic 
process defined by 

(1) «t"i ~ pXi-i — Z(, t “ 1, 2, • • • t T, | p | <C 1 ^ 

where the z t are unobservable disturbances, normally and independently dis¬ 
tributed with mean zero and variance a 2 , the x t are observed variates, and the 
“first observation” x 0 has a normal distribution with mean zero and such a 
variance o-j that all later observations have the same variance. Thus we have 

2 

( 2 ) 


2 

Ox ' ' 


and the joint distribution of a sample of T + 1 successive values is 


(3) 


4 a - p 2 )* r l., 

, Xt) — ( 27 rcr ! ) !r/5+1 ^ ‘ ex P ~2~i 3 


X 5 r 


— 2p(xoXi 4- ■ ■ ■ + + (1 + p 2 )(mi + •••+*»• 


-i))] 


Koopmans ([1], formula 96), by smoothing characteristic values, has obtained 
an approximation to the distribution of the serial correlation coefficient r for the 
case p = 0, where 

j m Xp Xj -f* ■ ■ * - |- Xr-i X r 


r = 


xl + 


+ 4 


1 Cowlea Commission Papers, New Series, No, 21. 
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This result is expressed in the form of a definite integral whose evaluation 
has not so far been effected. 

By considering the related circular stochastic process, where x$ is defined to 
be the same observation as x T , great simplification is obtained. Here the 
joint distribution of % , , • • ■ , x T is , 


/(* 1 , , 


\ A(p) 

» Xr) = > - 7 T- ,w , exp 




(5) 


(2vo' i ) Tli * L 2o- 8 (l - p 2 ) 

{ (1 4" p 2 )(*i + ’ • ■ + — 2p{XiXi + 


+ 


%T%l )} 1 


Up) = 


i T 

1 ~ p 

(1 - P 2 ) r ' 2 ' 


By smoothing characteristic values, Koopmans ([1], formula 92) found a definite 
integral and Dixon ([2], 3.22) an explicit expression for an approximate distribu¬ 
tion of the circular serial correlation coefficient f, for the case p = 0, where 

_ X1X2 + • • • + X T Xi 

( ® ■*■+--.: + * ' 

Dixon’s distribution R a (f) has the simple form 

r + 1) 

(7) - V2 ,„ t (1 - 0"->. 

r(J)r ("2 + 2 ) 

Rubin [3] proved these results to be equivalent. On the other hand, R. L. 
Anderson [4] obtained the exact distribution of t in the case p = 0. Madow [5] 
extended this result to the case p ^ 0, using a property of sufficient statistics 
also noted by Koopmans ([1], p. 17) in connection with the non-circular problem. 

It would, however, be difficult to find percentile points or moments from 
Madow’s exact distribution. An approximate distribution of f for p ^ 0, 
together with its moments, analogous to Dixon-Koopmans’ for p = 0, should 
therefore be of interest. The purpose of this paper is to obtain such a distribu¬ 
tion from the circular universe (5). The statistic f is shown to be a consistent 
estimate of p within the limits imposed by the approximation. In addition, an 
approximate distribution for p = £?+-■■ + x\ in the case p ^ 0 (which 
reduces to the exact chi-squared distribution when p = 0) is derived, together 
with all of its moments. 


’ 3 . We begin by asking about an approximate joint distribution of p and § de¬ 

fined by 


(8) 


2 . i 2 

p = xt + • • ■ + x T 

q = X1X2 + ■ ■ • + x T Xi . 
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Defining <f>(u, v) as the expectation of exp [i(uj> + w?)J, we have 
X(p) 


0(«, v) = 


(9) 


( 2 tr r‘) Tli 


“ 2 (r=7> + ”’") 5 }]' fc ''' 


dxr. 


On integration, wo find 

(10) <Mn,«) = X(p)[A(u, Of 1 

where A(u, v) is the determinant of the matrix associated with the quadratic 
form within the curly brackets in (9). A(u, v ) is a eirculant; its value as deter¬ 
mined from the eirculant formula ([2], p. 123) is 

(11) A(u t v) = jg (v ~ 22 cos 
where y and 2 are defined by 


( 12 ) 


V = 


2 = 


1 + P 


2 iaU 


1 


+ ifV 


To get an approximation A(u s v) to A(u, v) we smooth log A(u, v) by Koopmans’ 
method, We have 

(13) log A(u, ») = £ log (y ~ cos . 

We define I(u, v ) through 

(14) log l(u, *>) - J o log cos dt 

in which the summation in (13) is replaced by integration. The integral in (14) 
is easily evaluated ([6], p. 65) giving 


(15) 


*«,.) - ( i ±A£EE y. 


Incidentally, had we used q L = XiXj, +1 + • * • + x T x T+h in place of = § in (9), 
we would have obtained the same expression (15) for A{u, v ). 

Setting #(w, v ) = \{p)[A{u, a)] -1 we may determine \(p) by the requirement 
#(0,0) = 1. A simple calculation yields the result X(p) = (1 — p 1 )~ iTli> , (Note 


that = 1 — p T is close to 1 for large values of T). Our result for #(u, v) 


X(p) 

m 

appears as 
(16) 


*<«,*) 


y + vV - 42 s > 


-(Til) 
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The approximate joint distribution of p and q may be written as the double 
Fourier integral 


dv 


d7) U( Pl g) = ^ Cjy, i-<vp + m (tLi: Vjf ~- 4 ^ r/2 d W 

which we evaluate ([7], 576.3, 914.3) by changing integration variables from 
w, u to y, z and integrating out y and z successively. We obtain finally 


J5(p, q) = ~ __ P 2 )] T '* p-(T/2)- lfji __ -2 s Tt2~i 


(18) 


r ® r (2 + 5 ) 


V - g 2 ) 1 


•exp 


[- 


f (1 + P 2 )p ~ 2 pq] 


2cr 2 (l - p 2 ) 

Changing variables from p, q = pf to p, r, we obtain for F(p, f) , the approximate 
joint distribution of p and f, the expression 

Hv, = % . — -- i ^ - )] ~rr p w2-i (i ~ fy /2 -‘ 


(19) 


r(*)r 


(f + s) 


•exp 


P 


fl + P : 




L 2o»(l - p 2 ) 

We could also have derived (19), following Madow, by noting that for p = 0, 
p and f are independently distributed, p having the chi-squared distribution and f 
having approximately the Dixon distribution (7), and that p and f are sufficient 
statistics for the estimation of p and o'. 

4 . The approximate marginal distribution R p (f) of f is obtained by an easy 
integration from (19) 


A(f) = f F(p, f)dp = 

JQ *■ A , 


2/i 2\v 

p )J 


2 M-cr/ 2 ) 


r(*)r 


eu ) 


(1 - f 2 ) 


-2\r/2~l 


( 20 ) 


A,(f) = 


•jf =»p[- 2?(W) U + p’-Vl] 

r (f +1 ) 

r ® r ( 5 + 3 ) 


d - + p s - 2pf)- r/2 . 


Our notation is consistent since /^(f) indeed reduces to the Dixon distribution 
for p = 0. R P (r) has a maximum when 


T —— ^ma.T ’ 


hrj) ^ (1 + p2)(r " 1} ” Vnr - 2)(i - p 2 ) 2 + (1 + p*)«}. 


2 p(T - 2) 
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A little manipulation shows that 1 > | | > I p I and that f mar — p asymp¬ 

totically A graph (Fig. 1) of &,(?) for T = 20, p = 0, .2, .5, .7, .9 is appended 
from which it is seen that for | p | near 1, the distribution becomes highly con¬ 
centrated about fw ■ On differentiating R p {f) with respect to p and eliminating, 
the envelope of the A p (f) is seen to be 



Pie, 1 Graph of the Distribution of the Serial Correlation Coefficient in a Circular 

Universe, for T => 20 


5. Before evaluating the moments of 1 j p (f) we will pause to obtain the ap¬ 
proximate marginal distribution P p (p) of p, and its moments. We write 

on ' «« r (-2 + 1 2 ) 

•*"“ - [- b (£#] 'O' - f ’>" H « df: 


If we define the Bessel function of order v and purely imaginary 

argument by 
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n-o nir(v -j- n -f- 1) ’ 

we obtain ([8], p. 79), if p ^ 0 


and if p = 0 


(23 > - 1 ^ [- b (r^)] (^), 

and if p = 0 

<*> 

on performing the integration indicated in (21). P 0 (p) coincides with the 
exact distribution P 0 (p). An expression covering all moments of PJp) m 
obtained from (16) by setting v = 0, differentiating, and setting u = 0. We have 

( 25 ) . (v+aA?- r-^-Ti _T/2 

<t>(u, o) = x(p) ^_ y Li - p 2 J ) . 


hence 


d k t 


E w = r> hfa, o) - (-v)‘d - „r 


i# IV + 


1 - 4 


From (26), we readily find 


|»-(1 I-p'J/O-pO 


(27) %] = j[|] = (r 2 

%> 2 J = (iv ? ) 2 + 22V ("IdLA 

(28) ' V > 1 ~ p y 

Thus the unbiased character of p/T as an estimate of is reflected in the ap¬ 
proximate distribution, wliilc (28), which shows that lim a\, r = (), indicates 
that consistency is also reflected. T ~ t *> 

6. We now calculate the momentsof R,(f ). Interchanging the order of integra¬ 
tion in the expression for S[f»] is justified by the uniform convergence, so we have 



8G 


E. B. LEIPNIK 


(29) 


E[?\ = J+ f* jf %, f] dpj df » | f k P(p,f)df dp 

t r 2 <r ! (i - p j )]" r ' 2 r 
l\io 


GK*» 


T/2-1 


exp (f-rfl) |£ 1 ^(1 - fY 1 '" 1 ’ 3 exp (mf) df} dp 


■where rn is defined by 

(30) 

Defining G(m) by 

(31) 

we have ((8], p. 79) 


m — 


pP 


0*<1 -p 1 )' 

G(m) — J (1 — exp (mf) df 


(32) 


Q(m) 




-Tit 




r (4) r ( 2 +1)' 


Differentiating each side of (32) k times, we find by (31) and (32) 


dm k 


G(m) = f*(l - f 5 ) r/a ~ m exp (mf) 


(33) 


(*K* 


df 

2 w __ d* 

l\ dm 4 




+ 


2/ 


Using the identity ([8], p. 79) 


~ [*-’i,(z)} - r'W«) 


and changing the integration variable in (29) from p to m, we obtain finally 

I, [w" ! w»»)l to. 


(34) 


For fc = 1, we have ([8], p. 386) 


(35) 


Mf] = 


1+1’ 


For k = 2, after some tedious calculation, we find 

Mf*. 

(36) 


1 + P *T(T + 1) 


T + 2 1 (T + 2 )(T + 4) 


* f T + 2 t 1 


p a 2 , (r — 2) 

(T + 2)(T + 4) 


1 
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We note that lim E(r) = p and lim al = 0, so that at least to the extent of 

T-+00 oo 

approximation furnished by R P (f), f is a consistent estimate of p. 

The author wishes to express his gratitude to Dr. T. Koopmans, under whose 
kind direction this paper was written. 
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CONCERNING THE EFFECT OF INTRACLASS CORRELATION ON 
CERTAIN SIGNIFICANCE TESTS 

By John E. Walsh 
Princeton University 

1. Summary. In practical applications it is frequently assumed that the 
values obtained by a sampling process arc independently drawn from the same 
normal population. Then confidence intervals and significance tests which were 
derived under the assumption of independence are applied using these values. 
Often the assumption of independence between the values may be at best only 
approximately valid. For some cases, however, it may be permissible to assume 
that the correlation between each two values is the same (intraclass correlation). 
The purpose of this paper is to investigate the effect of this intraelass correlation 
on the confidence coefficients and significance levels of several well known 
confidence intervals and significance tests which were derived under the assump¬ 
tion of independence, and to extend these considerations to the case of two 
sets of values. 

In the first part of the paper the relations given in Table I arc used to compute 
tables which show the effect of intraelass correlation on the confidence coefficients 
and significance levels of the confidence intervals and significance tests listed in 
Table II. The second part of the paper consists of the proofs of the relations 
given in Table I, 

2. Introduction. Let the n values »i , .. , x n represent a single value of a 
normal multivariate population for which each of the n variables bos mean p 
variance a, and the correlation between each two variables is p. These n 
values will be called a correlated ‘‘sample." The values Xi, • * • , x„ and 
Vi, ■ ' ■ , y,n are said to represent two correlated “samples” if they have a normal 
multivariate distribution such that the x’s have mean p, variance a, correlation 
p, the y’s have mean /, variance o-' a , correlation p’, and the correlation between 
each x and y is p". This paper shows that several well known quantities which 
have Student t, %, °r Snedecor F distributions when the values form random 
samples still have those same distributions for correlated “samples” if the quanti¬ 
ties are multiplied by suitable constant factors, where it is to be remembered 
that for normal populations a correlated “sample” is a random sample if and 
only if p — 0 and that two correlated “samples” represent two random samples 
if and only if p =» p' = p" « 0. The quantities considered and the corresponding 

w m 

factors are listed in Table I, whore £= J2 *(/«• and Q = £ Va/m. Several oom- 

i i 

monly used confidence intervals and significance tests based on these quantities 
and derived under the assumption of randomness are considered, and tables are 
computed which show how the confidence coefficients and significance levels of 
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these confidence intervals and significance tests vary if the values arc from 
correlated “samples” instead of random samples. Table II contains an outline 
of the confidence intervals and significance tests considered. It is found that 
these confidence coefficients and significance levels can change noticeably when a 
correlated “sample” is considered. This is particularly true for the Student 
Meet. For example, in one case it is found that if the sample size is 32 and the 
significance level is .05 when p = 0, then the significance level becomes .23 for 
p = .05. This large change in significance level for a small change in p is ex¬ 
plained by the factor given for the Student /-distribution in Table I. This 
shows that test results which appear to be “significant” under the assumption of 
randomness are not necessarily “significant” when correlation is present, even 
though the amount of correlation may be small. The effect of correlation on the 


TABLE I 


Quantity 

Distribution For 

Random Sample 

Factor Multiplying 
Statistic lor 
Correlated “Samples" 

IS — p) V n(n — 1) {£ — p) V n(n — 1) 

Student t-distribation 

0»-i(/) dl 

J y-p 


S / « 

2) a 

y 1 + (n - l)p 

S' 1A, 

5)1 

x a -distnbution 

f„-,(x a ) dx 2 

1 

1 "" P 

<r'tS a ^ k* - e ) a 

0 V?' a *1 

i 

Snedecor E-distri¬ 
bution 

h n -i, m -i(F) dF 

1-p' 

1 - p 


X and Snedecor F tests is not as great as for the Student /-test as can be seen from 
the factors given for the x and Snedecor F distributions in Table I. 

3. Effect of intraclass correlation. The relations stated in Table I will now 
be used to investigate the effect of intraclass correlation on the confidence co¬ 
efficients and significance levels of several common types of confidence intervals 
and significance tests which were derived under the assumption of random 
samples. The confidence intervals and significance tests considered are listed 
in Table II, where S 2 and S' 2 are defined in Table I. These particular confidence 
intervals and significance tests have the property that if a is the confidence 
coefficient of the confidence interval listed for a given statistic, then 1 — a is 
the significance level of the significance test listed for that statistic, this relation 
holding whether random samples or correlated “samples” are considered. For 
this reason the tables given in this section will be limited to confidence coeffi- 
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cients; the corresponding significance levels can be obtained by using the above 
relation. 

a. Student t-disiribuiion. If a random sample of size n is drawn from a normal 
population with mean a and variance a (denoted by A r (a, <r‘)), a confidence 
interval for p with confidence coefficient e is given in Table II. If the n values 
form a correlated “sample”, however, it follows from Table I that the cor¬ 
responding confidence interval with coefficient t is 





i +1» -1)> 

n(n - 1)(1 - p) 




3 4- t,S 



n(n — lj(l 


jV 

- p)' 


TABLE II 


Stat¬ 

istic 

Para 

ractcr 

Exam¬ 

ined 

Confidence Interval 
(Confidence Coefficient <) 

Significance Teat 
(Significance Level « 1 — *) 

DcDnltlons of 

Constants 

t 

M 

Vn(n - 1) - M 

Sf+ l ‘ S 

V«(n — 1) 

1* - mI 

% l,8/Vu(n - 1) 

[ 0n-lU) dl >=> . 

J-t, 

X 2 

<T l 

SVxl 

— £ l/xl 

[ 2 /n-|Cx S ) “ « 

J x; 

F 

> 

0 s <r 3 / a'* 3 sy.w, 

o*8‘* , 

~ ]/F ‘ 

[ <P) (IF - a 

J F, 


The confidence interval given in Table II can lie rewritten as 


* - ^ \Z^W^T) S " S * +t V n (~- 


(n -l)p 


1)(1 - P )’ 


where 


ta ~ u ]/ l+V-l)p • 

Hence if p < 0, a > e and the confidence coefficient of the confidence interval 
in Table II is greater than e. This means that the significance level of the 
corresponding significance test listed in Table II would bo less than 1 — « so 
that any test result which would be significant for a random sample would also 
be significant for a correlated “sample” for which p < 0. If p > 0, however, 
* > a and the significance level of the test would he greater than 1 — «. Thus a 
test result which would be significant for a random sample need no longer be 
when p > 0. The effect of positive values of p upon the confidence coefficient 
« = a,(p , n) of-the confidence interval of Table II is given in Table III for the 
cases t = .95 and .99. Confidence intervals with unequal tails can be treated 
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in a similar manner. Ifc is thus seen that the effect of correlation on the con¬ 
fidence coefficient increases with the sample size n, and that even a very small 
amount of correlation can cause a large change in a. For example, for samples 
of size 16 a correlation of p = .05 will change the significance level from .05 to 
.135; for samples of size 32 a correlation of p = .05 will change the significance 
level from .01 to .102, and from .05 to .23. 

Confidence intervals for p — p' are given by Theorem 5 of section 4. It is to be 
observed that if p — p' = p" and a = a' the confidence coefficients are inde¬ 
pendent of p and cr. If m = n, p = p', a = a', p" = 0, however, the confidence 
coefficients of the confidence intervals for p — p' have the values a = a,(p, n) 
given in Table III. 


TABLE III 
Values of a t (p, n ) 


p 

0 

.05 

.1 

.2 

3 



n 









.00 


.083 

.974 

:* 3 * 

.944 

.920 

4 

,06 


.021 

.800 

mm 

.805 

.744 

8 

,00 


.059 

013 

.863 

.790 


05 


.865 

.767 

.620 



16 

.00 


.003 

.706 

.600 

.600 

.515 

.06 

.866 

.74 

.64 

.54 



32 

.09 


79 

.63 




.05 


.68 





64 

.00 

.79 






128 

.90 

.68 







b, % -distribution. If a random sample of size n is drawn from N(p, o- 2 ), a con¬ 
fidence interval for <r 2 with coefficient e is given in Table II. If the n values form 
a correlated "sample”, it follows from Table I that the corresponding con¬ 
fidence interval with coefficient e is 

0 g <r 2 g SVx.U - p). 

The confidence interval in Table II can be rewritten as 

, 0 5 <r 2 g SVxUl - p), 

where 

x 2 « = x*/(l - p). 

Hence if p < 0, a > e and the significance level of the significance test given in 
Table II is less than 1 — e. If p > 0, the significance level of the test is greater 
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than 1 - c The effect of positive values of p upon the confidence coefficient 
a = a x i(p, n) of the confidence interval listed in Table II is given in Table IV 
for e = ,95 and .99. Cases in which the lower limit of the confidence interval 
is not zero can he treated in a similar manner. Table IV shows that the, con¬ 
fidence coefficient a. = a x i{p, n ) decreases with the sample size, n for a fixed value 
of p. Although the effect of correlation for the ^-distribution is not as great as 
for the Student /-distribution, it does cause a noticeable change in a. For 
example, for samples of size 10 the significance level of the test in Table II is 
changed from .05 to .081 if p — .1 and from .05 to .13 if p == .2. For samples of 
size 32 the significance level is changed from .05 to .10 for p = .1 and from .05 to 
.19 for p = .2. 

c. Snedecor f-distribulion. If two random samples, one of size n (denoted 
by x’s) and the other of size m (denoted by y’s), are. drawn from N(p, cr 1 ) 
and N(n', cr' 2 ) respectively, a confidence interval for cr 2 / o ' 2 with coefficient « 


TABLE IV 
Values of a x *(p, n) 


p 

0 

A 

2 

3 

4 

s. 








4 

.9!) 

.988 

.980 

983 

.979 

.1)71 

.95 

.941 

.930 

.918 

.900 

.872 

16 

.91) 

982 

900 

.1)41 

.890 

.790 

.95 

919 

87 


(17 

.49 

32 

99 

mm, 

946 


,716 

.44 

95 

mm. 

.81 

1 

.38 

.17 


is^given in Table IT. If the values form two correlated “samples”, however, 
it follows from Table I that the corresponding confidence interval with coeffici¬ 
ent e is 


0 ^ cr 2 /cr' 2 


S 2 (l - pQ 
- S'*( 1 - P ) 


/ 


F t . 


The confidence interval in Table II can be restated as 


where 


0 g cr 2 /cr' 2 


S\1 - p') f v 

1 - p)/ 


F a = F,(l - P ')/(l - p). 

Thus if p = p', a = e and the significance level of the significance test given in 
Table II remains equal to 1 - If (1 - p ')/(l - p) < 1, a > e and the 
significance level is less than 1 — e. If (1 - p')/(l — p) > 1, however, a < « 
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and the significance level is greater than \ - t. Values of the confidence 

coefficient a = a F Q — p , n, v of the confidence interval listed in Table II are 

given in Table V for « = 95 and .99. Cases in which the lower limit of the 
confidence interval is not zero can be treated in a manner similar to that given 
above. Table V indicates that the effect of correlation on the confidence 
coefficient is not as great for n < m as for n > m. For example, if n = 4, m = 32, 


TABLE V 



^ 

--— = 1.25, the significance level of the significance test given in Table II is 

1 — p 


only changed from .06 to ,069, if ~~— P =1.5 from .05 to .087. If n = 32, m — 4, 


j _ j 

- - - = 1.25, however, the significance level is changed from .05 to .094, if 

1 P / 

4-- = 1,5 from .05 to .142. Also it is seen that for fixed ^-- , the effect of 

1 - p 1 - p 


intraclass correlation increases with both n and m. 
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4. Analysis. This section contains derivations of the relations stated in the 
first three sections. The method used in these derivations is similar to that used 
in one approach to the analysis of variance and consists essentially in expressing 
each variable as the sum of two quantities, one of which is the same for each 
variable and the other of which is different for each variable. 

Let xi, ■ ■ • , x„ represent a correlated “sample”, that is, have a normal 
multivariate distribution for which 

E{x ( ) = n, ' (t « 1, • * * , a) 

E[(x, - m) 2 ] = v 

E[(xt — ix)(xj - /t)] = pa, (i s* j = 1, ► ■ ■ ,n), 
1, ■ ■ • ,n), in the form 

xi = tf + X£ + h, 

where £ = 2 £</ft and tj, £i are independently distributed, y according to 

N(p, a\) and the £, according to N(0, a]). The values of X, a\ and a] are chosen 
so that the x, = n -f X£ + £,■ satisfy (1). It is easily proved that it is always 
possible to choose X, a\ and a] so that (1) are satisfied. It is to be remembered 
that p £ - l/(ft — 1) for intraclass correlation. From relations (1) and 
Xi — p + X£ + £,■ it follows that 

(2) E(&) = v 2 (l — p), (i = 1, • • • , ft). 

1 " 

Theorem 1. The quantity j 2 (%< ~ has a %-distribution with 

n — 1 degrees of freedom and is distributed independently of £. 

Proof. Since the £< are independently distributed according to the same 
normal distribution with zero mean, it follows from (2) that 

E^) 2 (£< - £) 2 = T ( ~7) 2 (*. - £) 2 

has a x 2 -distribution with ft — 1 degrees of freedom and is distributed inde¬ 
pendently of £ = ij •+ (1 + X)£. 


(0 

Write the xt, (i = 


Theorem 2, 
tnbution with n 


(£ - p.) V n(n - 1) 

1 (ft —- 1 )p / 

— 1 degrees of freedom 


T\ —— -- has a Student L-dis- 

i 1 - p 


Proof. If is easily seen from elementary considerations that 

<rV 1 + (ft — l)p 

has the distribution N(fi, 1). Theorem 2 is then an immediate consequenoe of 
Theorem 1 


Up to this point a single correlated “sample” of size n has been considered. 
The next part of the analysis, however, -will be concerned ivith properties which 
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Let X\ , • ■ • , x n , 2 / 1 , ■ • • , y m have a joint normal multivariate distribution 
such that 


(3) 


E(x.) = 

E{y a ) = 

E[(x, — p) 2 ] = a 
E{(y a - yf] = <d 2 
E[(.X{ — n)(xj — p)] = ptr 2 , 
E[(y a - a') (to - m')] = P V 2 , 
E[{x, — p)(y a — m')] = p"™' ■ 


(i = 1, • •• ,») 
(a = 1, • • ■ , m) 


(i ^ j = 1, ■ ■ ,n) 
(a /3 = 1, ■ ■■ ,m) 


Write the x, and y a in the form 

£. = V + Xi£ + \il' + £, 

(4) 

Va — y 1 + Xi£ + Xaf 7 ■+■ £« > 

in 

where £' = X) faM and *>, t/'> £i > • • • , £» , fi, • • ‘ , fm are independently 
1 

distributed, ij according to iV(p, <rj), tj' according to IV(p', o-' 2 ), the £, according to 
AT(0, o-f), and the fl according to N( 0, o-j 2 ). The quantities X x , X 2 , x( , x(, 
a 2 , <r' 2 , <r( , o-f 2 are chosen so that the a;< and y a satisfy (3). It is easily verified 
that it is always possible to choose these quantities so that the X; and y a con¬ 
structed in this fashion satisfy (3). In addition it follows from (3) and (4) that 


E{ti) = v 2 (t - P ) 

E(tf) = c' 2 (l - p'). 

1 n 1 »» 

Theorem 3 . -57;-r 23 ( x < ~ .7) 2 and - 757: — 77 23 (j/« — 27) 2 Tiaue x 2 - 

v U — p) 1 v (,1 — p) 1 

distributions with n —■ 1 and m — 1 degrees of freedom respectively, and are dis¬ 
tributed independently of each other and of x and y. 

1 " 

Proof. From Theorem 1 and (5) it follows that ~rr. -; 23 (%< ~ ®) 2 

a \ i — p) 1 

1 m 

and -# 7 z - x 23 (Va — w) 2 have v s -distributions with n — 1 and m — 1 degrees 

v U — p ) 1 

of freedom respectively. That thoy are distributed independently of each other 
and of both £ and y follows from (4). 


Theorem 4 . 


*' 2 (i - p0 Z (*, - *)* 

1 __ 

v 2 (i — p) 23 (to — 77) 2 


is distributed according to the Snedecor 


F-distribution h„- 1 , m -i(F)dF. 

Proof. This follows from Theorem 3. 
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Theorem 5 

[(£ - fi) - (p- l*')Wn + m - 2 / (*, - 2) 1 . £(j u ~ n) 1 

_ f y ^ ^ + ^ 

where 

*\ - ~ [1 + (» - l) P ] + ^ [l + (m - 1 )p'] - VW, 

hasa Student t-distributim with n + m — 2 degrees of freedom. 

Proof, It is easily seen from elementary considerations that ' [(£ — y)~ 

(p - p')] has the distribution A r (0, 1). Theorem 5 then follows from Theorem 3. 

The author wishes to express his appreciation to Professor John W. Tukey for 
valuable assistance and advice in the preparation of this paper. 



ON FAMILIES OF ADMISSIBLE TESTS 

By E. L. Lehmann 
University of California, Berkeley 

1. Summary. For each hypothesis H of a certain class of simple hypotheses, a 
family F of tests is determined such that 

(a) given any test w of II there exists a test w' belonging to F which has power 
uniformly greater than or equal to that of it). 

(b) no member of F has power uniformly greater than or equal to that of any 
other member of F 

The effect on F of various assumptions about the set of alternatives are con¬ 
sidered. As an application an optimum property of the known type A L tests is 
proved, and a result is obtained concerning the most stringent tests of the 
hypotheses considered. 

2. Introduction. In the theory of testing simple hypotheses, if a uniformly 
most powerful test exists, it is the most desirable test to use If, as is generally 
the case, such a test does not exist, the choice between tests none of which is 
"altogether better” than all the others, has to be based on information not con¬ 
tained in the general formulation of the testing problem. If no such additional 
information is available, the choice must of necessity be somewhat arbitrary. 

Now although a single uniformly most powerful test exists only in exceptional 
cases, there will always exist a family F of tests such that 

(a) given any test w of the hypotheses II under consideration and of prescribed 
level of significance, there exists a test w' belonging to F which has power 
uniformly greater than or eqvial to that of w 

(b) no member of F has power uniformly greater than or equal to, that of any 
other member of F 

The family F is essentially unique. Arbitrariness occurs only since a test region 
is not uniquely determined by its power function. But since two tests with the 
same power function are equivalent for testing purposes, it is from the pi esent 
point of view immaterial which one is included in F. 

With the same restriction F is essentially the family of admissible tests, a 
test w being admissible if there is no test of the same level of significance which 
has power uniformly greater than or equal to but not identically equal to that of 
w. This definition differs only trivially from the one given by Wald [1, p. 15] 
who defines a test w to lie non-admissible if there exists a w' with power every¬ 
where greater than that of w (except at the hypothetical point), 

F naturally depends on the class of alternatives considered. A restriction 
in the class of alternatives may (although it will not necessarily) diminish F. 
The family F may also be decreased by other additional information: For 
instance a probability distribution may be assumed for the set of alternatives, 
and some properties of this distribution may be presupposed. 
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The determination of the family F, (and a description of the power functions 
of the tests in F) might be considered a solution of the testing problem. The 
solution is not unique and hence does not provide a basis for action. This 
reflects the fact that additional information is needed to make possible the 
unique choice of a best test. On the basis of the available information, F repre¬ 
sents the furthest reduction of the problem that seems possible. On the one 
hand, if the choice of test is to be made from the point of view of power, the only 
contestants for “best test’ 1 are the members of F. On the other hand, the 
available information docs not give preference to any one, member of F over any 
other unless additional principles (such as unbiasedness for instance) are 
introduced. 

It is the purpose of the present paper to illustrate the above notions by deter¬ 
mining F for a very simple case. 


3. Determination of the family F. Let the random variable 


E = (X,, X,, • ■ ■ ; X„) 
have a probability density function 

(1) Pi 

depending on parameter 6. Concerning (1) we shall make the assumptions 
under which Ncyman [2, 3] has shown the existence of the type Ai test of the 
hypothesis 

(2) II x 0 - 0 o . 


Assumptions : 

(a) Conditions of regularity: 

The integral 

( 3 ) f po(e) de 

J V) 


C = (Zl , • * • , !Sn) 

dc = dxi - • • dx n 


extended over any region w in the sample space, admits of two successive deriva¬ 
tives with respect to 6 under the integral sign, i.e. 


(4) 3? L rpe(e) de = L w Mp) de for h - 2 - 

iff) A differential equation: 

If 


(5) 


¥>«(<0 = Q d log p ( (e) 
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tpo a is not identically zero, and there exist functions of 0 (but independent of e), 
A and B, such that 

( 6 ) <pe = A + B<pe 

Under these assumptions Neyman has shown 

A. that the probability density function p e is of the form 

(7) pe(e) = exp (P(0) + T(e)-Q(8) + R(e)} 


where Q is a monotone function with Q(9) |j_ 9| 

d6 


0 (without loss of generality 


we shall assume Q monotoncly increasing) and 
B. that the type Ax test of the hypothesis H exists, and is given by 


( 8 ) 


T(e) < Cl , T(e) > c 2 


for suitable choice of c i and c 2 . 

In what follows we shall assume that the permissible first kind error in testing 
H is fixed throughout and has the value «. By a test w of H we shall always 
mean a test of level of significance t, i.e. satisfying 


(9) 



E. 


Let us consider the family of tests 
(10) w(k ): T(e) < k, T(e) > f(k ); k < /(/c) 


where f(k) is determined by (9). It easily follows from (9) that k can take on 
all values from — oo to fa, say, where fa is such that 

(11) f(k o) = + oo. 


For the family F of tests (w(/c)), — » < k < fa we now state 
Theorem 1. AU members of F are admissible, and if w is any admissible test 
not in F, there exists a member of F which has power identical with that of w. 

We first prove the 

Lemma. Let /3 W denote the powerfundion of a lest w. Then if ki < k 2 


( 12 ) 


PuiO-xl ( 0 ) ^ ( 0 ) 

Pw(.kp ( 0 ) fiw(kz) ( 0 ) 

Proof: Let w denote the complement of a region w, 


if 0 < 0o 
if 0 > do, 

Consider the intervals 


(13) 


I = w(fa ) • w{fa) 
J — w(fa) ■ w(k 2 ) 


I lies entirely to the right of J. Let 9 > 6 0 . Then 
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(14) ^ = m exp |r(e)[Q(«) - QM1 

is a strictly increasing function of T since (2 is increasing. Therefore there exists a 
constant C such that 



< C if 
J&.W - 

T(e) is in J 

(15) 

c<?& if 
P*„( c ) 

T(c) is in I. 

Since 



(16) 

I p h (e) de = 

[ ps 0 (c) de 

‘'w(il) 

•'tolls) 

wc have 



(17) 

* 

o 

'cT' 

a, 

O 

1! 

/ p h (c) dc 


and therefore 


(18) J p e (e) de < C • p, 0 (e) de = C • jf p <0 (e) de < de 

from which it follows that 

(19) f Po(c) de < f p,(e) de 
which is the desired result. 

Proof of Theorem 1. The proof consists of several parts. 

I. Let m he any real number, and assume that there exists a value of k such 
that 

(20) P„(d 0 ) = e 

(21) ~ J3„(fl) |.-i, = m 

for w = w(k). Then w(k) has power uniformly greater than or equal to that ol 
any other test satisfying (20) and (21). 

For in = 0 this becomes Noyman’s theorem stating that the type A tost if 
also of type Ai . The proof of the theorem however is independent of the value o: 

(23) | fUB) |,„ fB 

and hence carries over to arbitrary m. 

II. If there exists any test satisfying (20) and (21) then there exists a numbe 
k for which w(k) also satisfies (20) and (21). 
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To prove this let us determine, of all tests satisfying (20), the one which 
maximizes 

(24) ~ Me) |«„d 0 = £ Pe{e) U_o 0 de 

This can be clone by means of the lemma of Neyman and Pearson [4, p. 11] 
which gives sufficient conditions for a region w, subject to restrictions 

(25) [ /,(e) de = a { , (i = 1, • • • , p), 

J W 

to maximize an integral 

(26) [ g(e) de. 

j w 

According to this lemma the desired test is of the form 

a 

(25) ~ p,(e) > a-p, t (e) 

provided a value of a exists for which this test satisfies (20). (25) is equivalent to 

(26) P'(flo) + T(e) ■ Q'(h) > a from (7) 

or, since Q'{6 0 ) > 0, to 

(27) 5T(c) > b. 

Thus, if a number b exists such that the test (27) satisfies (20), this test is the one 
maximizing (24). But such a number does exist, namely /(— oo), Therefore 
w(— w) is the desired test. 

Similarly it is easy to show that of all tests satisfying (20), w(k o) minimizes (24). 
But 

j r fi 

(28) T /W)(0) l»-»o = / m VM | 9 „ fl0 do 

is a continuous function of /c, and therefore takes on all intermediate values, 
which establishes II. 

Ill From I and II. we conclude that given any test w there exists a member 
of F which has power uniformly greater than or equal to that of iu For let w 
be any test of II. From the condition of regularity it follows that its power- 
function has a derivative at do. By II. there exists a value of k such that the 
powerfunction of w{k) has the same slope at 6 0 , and from T. it follows that 
w(k) is uniformly more powerful than w. 

But from the lemma we see that none of the tests w(A;) is uniformly more 
powerful than any other Hence all members of F a,ro admissible, and the 
theorem is proved. 

From the lemma and Theorem 1 we can conclude for all members of F the 
following optimum property: 
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Corollary 1: Let w be any Lest, and lei w„ hr. any member of F. Then at least 
one of the two statements 


(29) 


/3„(0) < /or all 8 < 0 O 

jSjfl) < /3« o (0) /or aH 0 > 0 O 


The lemma and Theorem 1 also give the following result concerning most 
stringent tests, defined by Wald [1, p. 33]. 

Corollary 2: There exists a uniformly most powerful of all most stringent tests. 
It is that unique member w 0 of F for which 


i.u.b, ri.u.b. M6) - 0„ o (0)"| = l.u.b. p.u.b. p „(0) - ^,(0)1 
«<«o L u J 6 > s o L <■ J 


4. The effect on F of assumptions about the alternatives. Let us next consider 
how a restriction in the set of alternatives effects the family F. From the lemma 
it follows that there is no change as long as the set of alternatives contains 
values of 0 both greater and less than 0 O . On the other hand, if the alternatives 
are restricted to values of 0 greater than 0o, say, the family F for testing H 
against these alternatives consists of only a single member, the test w(— «), 
(and similarly for the other onesided case). This follows from 
Theorem 2: Under conditions a. and b, the test w(— *=) is uniformly most 
powerful against the alternatives 0 > 0 O , the lest w(k 0 ) is uniformly most powerful 
against the alternatives 0 < 0 O . 

Proof: Let w be any test. By Theorem 1 there exists a number k such that 

(30) /J„(0) < /9 wW (0) for all 0. 

From the lemma it follows that 

, , Pvuk)(0) < /}*(»,)(0) if 0 < 00 

(31) 

PwO t)(0) < /3«(-»)(0) if 0 > 0 q. 

Combining (30) and (31) we have the desired result. 

(It is also easy to prove Theorem 2 directly from the Neyman-Pearson lemma.) 
In order to illustrate how the assumption of an a priori distribution of 0 
together with some information about this distribution affects F, let us consider a 
special case of the class of hypotheses discussed so far. 

Let 

(32) p 0 (x i, • • • , x n ) = c • 

so that E — (Xi, , • • • , X n ) is a sample from a normal distribution with unit 

variance and unknown mean, We want to test the hypothesis 

(33) II: 8 = 0. 

We shall show that if 0 has a probability density function g which, is symmetric 
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about the origin, then the family F for i.esting H consists, as might be expected, 
of a single member, the type A i test. 

Our problem is to find the test w satisfying 

(34) / p 0 (Ki, • , m„) dx t ■ ■ dx n = e 

J U) 

and which maximizes 


(35) 


[ g(8) [ Vote i. 

J— 00 •» UJ 


• ■, x n ) dx i • • • dx n d9. 


Inverting the order of integration, which is permissible in this case, the Ney- 
man-Pearson lemma shows the desired test to be of the form 

(36) J g(8)p e (xi, ,x n ) d6 > a-p D (x i, ■ • • , x n ) 

provided a value of a exists for which (36) satisfies (34). Substituting from 
(32), (36) becomes 

(37) f(x) = r g( e)e~^dO > a 

J— co 

where 

(38) S = 

n ,™i 

Since 

( 39 ) ~,m > 0 


the region (37) is either empty, which would contradict (34), or else can be 
described by inequalities 

(40) x < cti, x > a 2 
where 

(41) f(a i) = f(a 2 ) 

the latter equation becoming, oil substitution from (37) 

(42) f g{e)e~ inms \e nai0 - e * M ) d8 = 0. 

J-eo £ 

If g is an even function, (42) is certainly satisfied when «i = — a 2 , Our test 
then becomes 

(43) x < —ai, x > a 2 

which for proper choice of a 2 satisfies (34) and is the well known type Ax test. 
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5. Concluding remarks. Let us consider once more a probability density 
function satisfying a and b. We have seen that the family F for testing II 
against the alternatives 0 ^ Q 0 contains an infinity of elements unless we make 
some additional assumptions. On the. other hand, if the principle of unbiased¬ 
ness is accepted, F shrinks to a single element: the typo Ai test. 

Rut unbiasedness does not insure power. Thus conceivably some other test 
might be moie powerful than the test chosen, everywhere except in a small one 
sided neighbourhood of 6 0 . That tliis is not so is shown by Corollary 1 to 
Theorem 1. This remark illustrates how intuitively appealing principles and a 
knowledge of the family F may be used in conjunction to arrive at a choice of 
a satisfactory test, when not enough information is available to make the choice 
compelling. 

Finally, it should be pointed out that although we restricted our considerations 
to simple hypotheses, the notions developed also apply to composite hypotheses. 
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CONDITIONAL EXPECTATION AND UNBIASED SEQUENTIAL 

ESTIMATION 1 

By David Blackwell 
Howard University 

1. Summary. It is shown that E[f(x) E(y | a;)] = E(fy) whenever E(jy) 
is finite, and that <r~E{y | r) < <t\j, where E(y | a:) denotes the conditional ex¬ 
pectation of y with respect to x. These results imply that whenever there is a 
sufficient statistic u and an unbiased estimate t, not a function of u only, for a 
parameter 8, the function E(t | it), which is a function of u only, is an unbiased 
estimate for 8 with a variance smaller than that of t. A sequential unbiased 
estimate for a parameter is obtained, such that when the sequential test termi¬ 
nates after i observations, the estimate is a function of a sufficient statistic for the 
parameter with respect to these observations A special case of this estimate is 
that obtained by Girshick, Mosteller, and Savage [4] for the parameter of a 
binomial distribution 

2. Conditional expectation. Denote by x any (not necessarily numerical) 
chance variable and by y any numerical chance variable for which E(y) is finite. 
There exists a function of x, the conditional expectation of y with respect to x 
[3, pp 95-101, 5, pp 41-44] which wc denote, as usual, by E(y | x) and which is 
uniquely defined except for events of zero probability, such that whenever f(x) 
is the characteristic function of an event F depending only on x (i.e. / — 1 when 
F occurs and / = 0 when F does not occur), the equation 

(1) E[f(x)E(y | *)] = E[f(x)y] 

holds. Now if /(x) is a simple function, i.e. a finite linear combination of char¬ 
acteristic functions, it is clear from the linearity of expectation that (1) continues 
to hold Quite generally, we shall prove 

Theorem 1: The equation (1) holds for every function f(x) for which E[f(x)y] 
is finite 

To simplify notation, we write E(z | x) = E x z foi any chance variable z. The 
following corollary to Theorem 1 asserts simply that the operations E x and 
multiplication by/(x) are commutative This fact, which is trivially equivalent 
to Theorem 1, has been stated by Kolmogoroff [5, p. 50], 

Corollary: If E[f{x)y] is finite, then E x [f(x)y] = f{x)E x y. 

Proof of Corollary: If g(x) is a characteristic function, then E(gfE x y) — 
E(gfy) by Theorem 1. Since E x (fy) is unique, the Corollary follows. 

Proof of Theorem 1: Since Theoiem 1 holds when f(x) is a simple function 
and the product of a simple function and a characteristic function is a simple 
function, the Corollary holds when/(x) is a simple function, 

1 The author is indebted to M A. Girshick for suggesting the problem which led to this 
paper and for many helpful discussions. 
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Now let f(x) be any function for which E(Jy) is finite. There is a sequence of 
simple functions/„(*) such that f n (x) -»/(r) and |/.(r) | < \/{x) |. For instance 
we may define/„(»') = m/n when m/n < /(.r) < (m + 0 < m < n , /„( x) 

= m/n when (in - 1 )/n < f(x) < m/n, 0 > m > -n 2 ,f„(x) = 0 otherwise. 

We recall the following proposition of Doob [2, p. 296]: 

(2) I E x y | < E. | y | 

with probability one. Then, using the Corollary (for simple functions) and 

(2) , we have | f n E x y | = | E x (f„y) | < E x | f n y | < B x I fy I • *^b° 

(3) bum = n<j„u). 

Since the two sequences of functionsarc bounded in absolute value by 
the summable functions E x \ fy \ , \ fy | , Lebesgue’s theorem [8, p. 29] applied 
to (3) yields (1) 

In section 3 we shall use the fact that if u is a sufficient st atistic for a parameter 
0 and/is any unbiased estimate for 0, then IHf \ u) (which, since v is a sufficient 
statistic, is a function of u independent of 0) is an unbiased estimate for 0. This is 
obvious, since it follows from the definition of conditional expectation that Lhe 
two chance variables/ and E(f | u) have the same expected value. The interest¬ 
ing fact is that the estimate E(J ] u) is always a bettor estimate for than / in the 
sense of having a smaller variance, unless / is already a function of u only, in 
which ease the two estimates/ and E(f j ti) clearly coincide. This is simply the 
fact that the variance of the regression function of/ on u is not greater than the 
variance of /. In the case of Gaussian valuables, where, the regression is linear, 
this fact has been noted by Doob [1, p 231]. 2 Our statement is embodied in 

Theorem 2: If a y is finite , so is <fiE x y, and aE x y < a'y, with equality holding 
only if E x y = y with probability one. 

Proof: Denote by m the common expected value of y and E x y, Suppose for 
the moment that a 2 E x y is finite. By the Schwarz inequality E[yE x y] is then 
finite. Then ay = E(y - m) 2 = E[(y - E x y) + ( E x y — m)f - E(y — E x y ) s 
+ oE x y, since E{E x y(E x y — m)] = E\y(E x y — m)] by Theorem 1. Thus ay 
exceeds aE x y by E(y — E x yf, which is positive unless y = E x y, i.c. y is a func¬ 
tion of x. Thus we obtain the usual decomposition: the variance of y is the 
variance of the regression of y on x plus the variance of y about the regression of 
y on x. 

To show that aE x y is finite, we require the following 

Lemma (Schwarz inequality): If EiJ 2 ) and E(g 2 ) arc finite, then, with 
probability one, 

Elm < E x (f)E x tf). 

A proof can be constructed on the usual lines by considering the function 
Q(x, A) = EJf + Xg)' 1 . There are, however, certain measure-theoretic difficulties 

1 For functions of finite variance it is possible to interpret conditional expectation as a 
projection in Hilbert space, when the statement becomes simply the Bessel inequality 
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in handling simultaneously the conditional expectations of the family of chance 
variables (/ + Xfir) 2 ; instead we shall give a simple direct proof based on the 
ordinary Schwarz inequality for integrals. 

We may suppose / > 0, g > 0 with probability one, since, from (2), 

Elm < Eli I / I! g I) 

with probability one. Unless the Lemma holds there are three positive numbers 
a, b, c with a > be for which the event 

\E x fg > a\ E x (f ) < b, E x (g l ) < c} =11 

has positive probability Then denoting by h the characteristic function of H 
and using the Schwaiz inequality for integrals, we have 

aP\H) < E\hE x (Jg)\ = E\hfg) < E(hf)E(hg 2 ) 

= E[hE x (f)]E[hE x (g~)] < bcP\II), 

which is impossible. This completes the proof of the Lemma. 

The Lemma, with / = y, (7 = i, yields E\{y) < E x (if) with probability one, 
which implies the finiteness of aE x y and hence completes the proof of Theorem 2. 

3. Unbiased sequential estimation. Consider a chance variable z whose 
distribution depends on a parameter 6. If we have an unbiased estimate l(z) 
and a sufficient statistic u(z) (not necessarily a single numerical chance variable) 
for 0, then, as mentioned in section 2, v(u) = E(t\ u) is an unbiased estimate for 6 
depending only on u. 3 We have shown that the variance of v is never greater 
than that of t, and we shall see that it is sometimes much smaller (see example II 
at the end of this section) The estimate obtained in this section for the param¬ 
eter of a sequential process is of the v type; its importance lies in the fact that 
in many cases there is an unbiased estimate t (generally poor) which is a function 
of the first observation, and which will consequently be an unbiased estimate no 
matter what sequential test procedure is used. 

Let xi , Xi, ■ ■ be a sequence of chance variables whose joint distribution is 
determined by an unknown point 6 in a parameter space. A sequential sample, 
(test) [9] is determined by specifying a sequence of mutually exclusive events 
Si, Si, - • • , where depends only on Xi , ■ • - , x { and 

(4) £ P(SJ = 1 for all 0. 

i-i 

The event Si is that sampling stops after the ith observation, and (4) ensures that 
sampling stops eventually. Thus if we define the chance variable n — i when Si 

occurs, n is the size of the sample. 

-- 1 --- 

s It was pointed out by the referee that, strictly speaking, u does not have to be sufficient; 
it is necessary only that v(u) be independent of 6. The author is indebted to the referee for 
many valuable suggestions. 
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Denote by Ui, v 2 , ■ • • any sequence of chance variables such that u, = 
u,{xx, ■ •' , Xi) is a sufficient statistic for estimating 9 from xi, ,x t . There 
will of course be many such sequences fit,}, but it often happens that there is 
one which wises in a natural way from the sequential process; if we are sampling 
from a binomial population, for instance, it; = number of defectives in the first i 
observations is a sufficient statistic. We shall suppose that the sequential test 
satisfies the following condition 

(6) -Si = WiC(Sx+ ••• + 

where Wi is an event depending on it, only. This condition means that when 
the ith observation is taken, the decision to stop at this point depends only on 
the ith sufficient statistic it,. For the binomial example mentioned above, this 
means that the decision to stop after i observations depends only on the number 
of defectives observed at that stage, and not on the order in which they were 
observed. The Neyman criterion for it, to be a sufficient statistic [7,10, p. 136] 
shows that (C) is no restriction whatever for the sequential probability ratio 
test [9] since the ratio in terms of which the test is defined will be a function of 
it; only. 

Let , U , • ■ ■ bo any sequence of chance variables such that <, is a function of 
; define i = t, when St occurs. If E(l) — 0, t is said to be an unbiased 
estimate for 6 (relative to the particular sequential test {-S';}). The theory of 
sequential sampling lias been formulated primarily for testing hypotheses, a 
problem which arises naturally and often is the following: After a sequential 
sample has been obtained, is there an unbiased estimate for 9 ? Since a sample 
of constant size is a special case of a sequentially selected sample, we cannot 
hope to find unbiased estimates for arbitrary sequential samples unless such 
estimates exist for samples of every constant size. This is equivalent to the 
existence of a function t(xi) for which E(t) — 0 for all 0. Our problem is to 
discover an unbiased estimate for 6 which, when n - i, is a function of u,- alone. 
Such an estimate has been found by Girshick, Mostellor, and Savage [4] for 
sequential samples from a binomial population. It turns out that whenever 
there is any unbiased estimate at all for a particular sequential test, there is 
also one of the type described. Thus, if there is an unbiased estimate l for 
samples of fixed size N, there will be an unbiased estimate of the type described 
for every sequential test requiring at least N observations, since J is itself an 
unbiased estimate for such sequential tests. 

Denote by i any unbiased estimate for 9 relative to a particular sequential 
tost {-S;j. Denote by w,, hi the characteristic functions of the events Wt, 
C{Si+ • • ■ + -S.) respectively, and define u = u, ,v - fi(h, | •u,)/F(/i.;„ 1 ] u ( ) 
when n — i. To justify the definition of v we remark that the event (ft = i, 

| w.) = 0] has probability zero, since qh,-i < /i,„, with probability one, 
where q is the characteristic function of the event \EUh-i ] it;) > 0), while 


* For any event A, 0(A ) denotes the event that A does not occur 
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Eiqhi- 1 ) = E[qE0h~i | «,)] = E[E(h,- 1 1 w,)] = EQu-i). 

Since w, is a sufficient statistic for 6 with respect to x x , ■ • • , Xi , v is a function of 
u and n only, independent of d. The main result of this section is 
Theorem 3. v is an unbiased estimate for d. 

Proof: We shall show that v = E(t j u, n). This not only shows that v is an 
unbiased estimate for 0, but also interprets v in a very simple way and, as men¬ 
tioned above, implies that the variance of v does not exceed that of t. It must 
be verified that for every event D depending only on n and u, E{dv ) = E{dt), 

oo 

where d is the characteristic function of D. Now D = X DSi, and DS ( = D,S t 

i-i 

where D, is an event depending only on u t . It is sufficient, then, to show 
£7(d,i£);7s,_ip) = E(d l W{h i -it), where d, is the characteristic function of Z), . Now 

EidiWihi-iP) = EldtWJh-iEQu-iti ] uf)/E(hi-i | «.)], 

using the definition of v. The function in brackets is hi-i multiplied by a function 
of u l ; by Theorem 1 its expectation is unaltered if hi-\ is replaced by E(h <~i | ik). 
Thus the right member of the last equality equals 

EidflViEQh-ih | «,)] = E(d l wji,-it,) = E{d l w,h x -d) ■ 

We conclude with two examples: 

I. Binomial anb Poisson distributions. Suppose x x , a 8 , • • are inde¬ 

pendent with identical distributions, either binomial or Poisson, with parameter 
6. Then t = x x ( = U for all i) is an unbiased estimate for 0, and it is well known 
that u, = Xi + ■ ■ • + x % is a sufficient statistic for estimating 9 from Xi , • • • , . 

For any sequential test satisfying (6) our unbiased estimate for 0 will be 

_ EQii-iX 1 \u i = u) _ E(K-iXif) 

E(hi-i\u, = u ) E(K-if) 

when n = i, Ui = u, where / is the characteristic function of the event Ui = u. 
Then 

oo 

X i) 

V = - 

oo 

X ki («, i) 

3-0 

hi(u, i ) 

v = 1 - 

X *,■(«, i) 

J -° 

where k,(u, i ) denotes the number of possible sequences h , • • • ,xt for which 
n>i,x i +••■+*. = t*, and x x = j For the binomial case, this is the estimate 
found m [4] 

II. Samples oe constant size. We consider the special case where a 


for Poisson 


for binomial 
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sample of constant size N is selected, xi, ■ ■ ■ , x N are independent with identical 
distributions, and the density function for x< has the form 

(7) V(x, 8) = r(8)s(e) a{x) q(x) 


considered by Koopman [6] 6 . Suppose further that there is an unbiased estimate 
£(,ci) for 0. These conditions will be satisfied, for instance, if 9 is the mean of a 
binomial, Poisson, or normal distribution, with w(x) = t{x) = x. Then u N 
= w(x\) 4- • • • + w{%n) is a sufficient statistic. Our estimate v becomes simply 
v = | w*]. Now E[t(x {1 | u N ] =■-•■•= E[t(Xtf) | u. v ], since u N is a sym¬ 

metric function of aii, ■ • , x lV , which are independent with identical distribu¬ 
tions, Consequently 


v = E 


£ t(x,)/N | u» 


so that 



a 2 l(x i)/2V. 


N 

In the special case w(x) = t(x) ~ x, we have u = ^ Xj/N, i.e. our estimate is 
simply the mean of the N observations , • • • , x N . 
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THE DISTRIBUTION OF THE MEAN 

By E. L. Welkeb 
University of Illinois 

1. Summary. Both population and sample mean distributions can be repre¬ 
sented or approximated by Pearson curves if the first four moments of the 
population are finite. Using the <x 3 , 8 chart of Craig [2] to determine the Pearson 
curve type for the population, an analogous 5^ , 6 chart is derived for the dis¬ 
tribution of the mean. This defines a one to one transformation of a\ , o into 
al , <5. The properties of this transformation are used to discuss the approach 
to normality of the distribution of the mean as dictated by the central limit 
theorem. This is facilitated by superposing on the at , 8 chart the al , 5 charts 
for samples of 2, 5, and 10. 

2. Introduction. For any given distribution function of a population, a 
method is available for finding the distribution function of the mean, when it 
exists, that depends on characteristic functions and the Fourier integral theorem. 
For example, characteristic functions have been used to show that the arithmetic 
means of samples from a normal population is normal, and, with minor restric¬ 
tions on non-normal populations, that it is asymptotically normal. The method 
depends, of course, on a knowledge of the exact population distribution. 
Some authors have discussed the approximation of the distributions of sample 
means in special cases by one of the Pearson curves It is the purpose of this 
paper to consider the complete range of Pearson curves as populations to be 
sampled, then to give the sampling distributions of the mean as approximated 
by the Pearson system, and to discuss the manner in which the distribution of 
the mean approaches the normal curve as dictated by the central limit theorem. 
Since the choice of a Pearson curve depends only on moment relationships, this 
will include the approximation of the distribution of the mean for any parent 
population as based on its moments. Both an algebraic and a graphic analysis 
will be given. 

3. Semivariant and moment relationships. Denote by a* the fcth order 
moment of the population with zero mean and unit variance. Let he be the fcth 
order seminvariant of the population. Let a k and h< be the same parameters 
of the distribution of x, the mean of a random sample of size N drawn from this 
parent population. Using properties of the seminvariants of linear functions of 
variables independent in the probability sense, formulas relating those param¬ 
eters [1] are 

X* = X*2V 1_ \ 

= a 3 = aaiV , 

a = [a + 3 {N - 1) ]N~\ 
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and 
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4. The Pearson system of curves and the distribution of the mean. The 
determination of the Pearson curve will bo made m accordance with the scheme 
discussed by C. C. Craig [2]. In this system the curve type is fixed by the 
moment <xs and the constant 

2«4 — 3aa — 6 
~~ 04 + 3 



Fig. 1. The a \, 5 Chart for Pearson’s Curves 


The scheme for determining the type of curve is shown graphically in Pig. 1 in 
which the ul , 5 plane is divided into areas in which the Pearson curve types are 
noted. The bounding <4,5 curves arc 

5 = —1, 5 = -h 5 = 0, 5 ~ j, «a = o, 

ozl = 45(5 + 2), and (2 + 35) aj = 4(1 + 25)"(2 + 5). 

Let 5 denote the value of the 5 function for the distribution of the mean. Then 

- 2o< 4 — 3&a — 6 
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In terms of moments of the parent population 


2 ■ 


+ 3(IV - 1)‘ 
N 


O “3 

~ d N 


oti + 3 (IV — 1) 

N 


+ 3 


2<*4 — 3 aa — 6 
on + 3 + 6 (N - 1) ‘ 


We see that 6 = 6 for N — 1, and 5 < 6 for N > 1. Both 6 and at approach 
zero as N approaches infinity. These are the values of the constants for the 
normal function. This result is expected from the central limit theorem. 


6. The at, 5 diagram for varying sample size. For every given population 
with finite moments of orders 1 through 4 there exists a Pearson curve represent¬ 
ing or approximating its distribution. This determines a point in the at , 5 
plane. For a given sample size, N, there corresponds a point in the at , 5 plane. 
If the point (a3, 6) is now plotted on the <23 , 5 plane, we can determine the type 
of Pearson curve which is needed to approximate the distribution of x. The 
transformation of 0:3 , 5 into at , 8 enables us to analyze the relationship between 
population distributions and distributions of a;. The transforms of the boundary 
curves in the at , 8 plane will constitute an at , 8 chart corresponding to the one 
for at , 8 shown in Fig 1 In studying the approach to normality of the dis¬ 
tribution of x, it is illuminating to superimpose this &3 , 6 chart on the at , 8 chart. 
In order to do this, it is necessary to make certain algebraic changes in the 
equations. 

First eliminate from the formula for 8 as follows. From 


8 


2 a 4 — 3a3 — G 

^7T3 


we find 


a ,1 = 


36 -f- 3 c*3 -|- G 


Substitute this in the expression for 6. Then 


2o£* — 3c4 — 6 _ 6(013 + 4) 

« 4 + 3 + 6(IV - 1) ~ a! + 4 + 2(IV - 1)(2 - 6)' 


This formula, in conjunction with 

at = Nat 

enables us to write the transformations of the boundary curves 


Boundary Curve 
8 = -1 


6 


-i 


Transformed Curve 

; _ -(Nal+ 4) 

’ IVfiJ + 4 + 6(IV - 1) ‘ 

5 _ -(IVaS + 4) 

’ 2(IVa§ + 4) + 10(IV - 1) ‘ 


6 = 0 


5 = 0. 
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s _ 2 = 2(A Ta\ + 4) 

S ~ 5 5(iVajj + 4) + 16(AT - 1)' 

0:3 = 4S(5 + 2) + 4 -f- 2S(N — 1)] 

= 4S(«3 + 4)[5(iVs| + 8 N — 4) + 2Na\ -j- 8], 
(2 + 38)a! = 4(1 + 25) (2 + 5) [5(162V + M&l - 4) + 2 Net] + 8] 

[Nal + 4 + 25(1V - l)fNa\ 
= 4[5(21Vaa + 1 ON — 2) + Nal + 4][5(iVa[j + 8N — 4) + 2Nal + 8], 


8 



Fig. 2 shows the chart for distributions of re for IV = 2 by dashed curves 
superimposed on the chart for the population shown by the solid curves, and 
Fig. 3 consists of the same curves for N = 5 and N = 10. The intervals on the 
population values are 0 < < 12 and — 1 < 5 < .4 in Fig, 2, but only part 

of the c4 range is shown in Fig. 3. In each case the curves for the distribution 
of x cover the interval for a] , 6 which corresponds to the entire interval shown 
for the population in Fig. 2. Population curves are identified by capital letters 
and the corresponding curves for the distribution of x by the corresponding lower 
case letters. 
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Before discussing the Pearson curve relationships disclosed by these graphs, 
let us analyze some of the geometric properties of the transformation itself. 
Let N be considered as the parameter defining families of curves in the at, <5 
plane corresponding to at = constant and 6 = constant, the systems of lines 
parallel to the coordinate axes. The transform of at = k is at = k/N, a system 
of lines perpendicular to 5 = 0, and approaching at = 0 with increasing N at 
the rate kN~ . The line at = 0 is invariant under the transformation, but it is 
not pointwise invariant. 

6 «- s g k - io 



Fig 3 The a\, S and ot*, 5 Charts 


The transform of 5 = C is 

, _ C(Nut + 4) __ , _ C(a\ + 4/AO 

m\ + 4 + 2 (N - 1)(2 -C)’ as + [4 + 2 (N - 1)(2 - CM ' 1 ' 

Solving for at, this becomes 

.2 _ 4C - S[i + 2(2V - 1)(2 - Ol 
“ 3 N(s - C) 

Except for the straight line 1 = 0, obtained when C = 0, this is a system of 
rectangular hyperbolas with asymptotes 
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al = -[4 + 2(JV — 1) (2 - COJ/T 1 and 5 = C. 

We are concerned only with the range > 0. Hence 

-[4 + 2(N - 1)(2 - C)]ir l 

must be positive for the asymptote to show on the diagram. Since | S | <2, 
and thus | C | < 2, the expression in brackets is necessarily positive. Hence 
the vertical asymptote is always outside the range of interest and will not show 
on the diagram. However the horizontal asymptotes, 5 = (7, do appear in all 
cases. The hyperbolas are concave downward if C' > 0 and are concave upward 
if C < 0. 

Lines of the pencil S = ma 3 arc transformed into the hyperbolas 

- muliNctl + 4) 

aj - 2rm\(N - 1) + 4 

for N > 1. It is clear that (0, O') is the only invariant point. Every point on 
8 = mal is transformed into a point closer to the origin, the Bquarc of the distance 
from the origin changing from 

(m 2 + l)c4 to (m 2 + l)aaiV 2 

It is easily verified that the hyperbolas are asymptotic to 

mNal (N - 1)(1 + 2 m) a , _ -4 

1 - 2m(W - 1) [1 - 2 m(N - 1)? 3 1 - 2 m(N - 1) ’ 

As N approaches infinity, these asymptotes approach 

8 = — and = 0. 

An area in quadrant one (four) in the a 2 , 5 plane is transformed into an area in 
quadrant one (four) in thd a\, 8 plane. The transformed area is nearer the 
origin. 

6. Types of Pearson curves for distribution of sample means. Examination 
of the graphs in conjunction with the above described properties of the trans¬ 
formation shows the following facts regarding the distribution of means of 
samples drawn from populations identified by a s and 5. First consider the 
normal function and the three mam Pearson types only. 

i Parent Population 

Normal 

It 
1 / 

In 
IV 
VI L 
Vb 


Distribution of Sample Means 
Normal 

It 

1/ and I*, 

Iu , I j and If, 

IV 

VIx, and IV 
VL , VL and IV. 
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The transition types were disregarded completely in the above analysis. It is 
worth noting that, disregarding type X, III is transformed into III, VII into 
VII, lit; into Hi, never into lit,, V into IV, but never into V. Type X is 
transformed into type III, never into X. Others follow a similar pattern, 

These moment relationships on the distribution of the mean are not sufficient, 
conditions in general. In special cases they are, for example the normal dis¬ 
tribution and the type III (see [3]). They do represent the best approximation 
curve as specified by the Pearson system. We know that m some cases, for 
example type II (see [3]), the distribution of means is not described by a Pearson 
curve. It is clear, however, that the approach to normality is indicated ana¬ 
lytically by the transformation <* 3 ,5 to 5|j, 5 and is shown graphically by the 
, S diagram. Skewness and kurtosis in the parent population are reflected 
in the distribution of the mean in small samples A symmetric distribution 
of the mean requires a symmetric parent population regardless of sample size, 
but the degree of skewness decreases rapidly with an increasing numbei in the 
sample. The Pearson curve which approximates the distribution of x from a 
bell-shaped parent population is also bell-shaped. The Pearson curve approxi¬ 
mating the distribution of x for samples of N = 10 (Fig, 3) is bell-shaped for 
any parent population with values of ctl and <5 within the intervals considered. 
For samples of 5 in the same range the approximating curve is either bell-shaped 
or J-shaped, but it is never U-shaped. For samples of 2, even the U-shaped 
distribution is possible, but only with extreme values of 03 and 5. The point in 
the «3 , S plane corresponding to the normal curve is the only invariant point in 
the transformation Hence parent populations with parameters not satisfying 
a] = 8 = 0 cannot yield normal distributions of sample means. 
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NOTES 

This section is devoted to brief research and expasitory articles on methodology 
and other short items. 


ON THE STHDENTIZATION OF SEVERAL VARIANCES 


By B. L. Welch 
University of Leeds, England 


1. Introduction. In a recent paper [1] the author considered the problem 
of eliminating several variances simultaneously from probability statements 
concerning the mean of a normally distributed variable. The general situation 
envisaged was as follows. We supposed that we had an observed quantity y 
which could be assumed to be normally distributed about a population mean 

' h 

tj with variance <rj = 2 X,o-<, where the X, are known positive numbers and the 

leal 

a\ unknown population variances. It was supposed further that the data 
provided estimates s) of the a\ based on /, degrees of freedom, and having the 
sampling distributions 


(1) 


p(A) ds\ 





and that these estimates were distributed independently of each other and of y. 
The problem was to make statements about the magnitude of the difference 
y - r, which would involve explicitly only the observed variances s\. The 
probability of the truth of the statements was also to be entirely independent 
of the population values <r\. 

The solution was given implicitly in a formal mathematical expression and a 
general process of developing successive terms in a series expansion was de¬ 
scribed. In the present communication a slightly different way of reaching this 
development is provided 

2. General method. If the/, are large enough the ratio 


(2) a = 

can be taken to be normally distributed with mean zero and standard deviation 
unity. This suggests that, when the /< are not necessarily large, we might 
approach the matter by seeking some other function 

(3) x ~ S\s\, si, ■ ■ • , si, y — y] 

which will still be normally distributed with the same mean and standard 
deviation. We shall see that such a function can be found, although the method 
to be followed leads us first to another expression 
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( 4 ) 


y 7] /t(si, S 2 1 • ■ • , Sk, x) 


which is simply the transposed form of (3). Once we have obtained h we can 
solve out from (4) to obtain a;. 

Since the distribution of y is independent of we have 


(y — ■>?)' 


dy. 


( 5) p(y 118 } dv = exp r* -fer 

Transforming therefore to the new variable x we have for given si 

p(r I dr--- c”d / _ 1 ^ ( s ’ \ ^H s i x ) 

p[sc\s)dx - ^2^7, ex P\ a -^r I dx a* 

= j{s 2 , x, SX.ff!) dx (say). 


( 6 ) 


The unrestricted distribution of x is then obtained by averaging ovei the joint 
distribution of the s ]. In order that x should be a unit normal deviate we must 
therefoie have 

(7) p(®) = /•'■/ •?{«*. x > 2X,<^} II {?(«?) dsl] = c~ u \ 


We have to substitute from (1) and (6) into (7) and then choose the function 
h(s 2 , x) in such a manner that the equation is satisfied whatever may lie the 
values of the unknown a\ . To evaluate the function by the methods of numeri¬ 
cal integration is probably impracticable except perhaps m some simple special 
cases. A series development is, however, quite feasible. 

Symbolically wo can write 

(8) j {s 2 , a, j{w, x, SX,<7 2 } 


where d , denotes differentiation with respect to w, and subsequent equation to 
a] . Equation (7) then integrates out to give 


(9) 


He 


<r{$i 


1 “ 


2cu d t 


-1 r, 


j{w, X, 2X,<7 2 } = 


-ix* 


i.e. 

(10) ®j{w, x, SX l( 7?) = (say)- 


The operator © must bo expanded in powers of before it can be interpreted 
When this is done we find 
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Our procedure now is to find successive approximations to h(s\ x). It will 
be convenient to denote by h r {s 2 , x) an expression which equals 7i(s 2 , x) to terms 
of order l/f [. Further let c r+ .i(s 2 J x ) be a corrective term which when added on 
to h r (s\ x) will give a result correct to terms in Then to this order we 

shall have from (6) 


03) 


V2rjlw,x,2K*\} 


i hl(w, x)\ dh r (w, x) 
SXiV? J dx 


+ 


V sx, o-. 


exp< 


X 5 (2X{«0\ Jdc r +l(w, X ) x(2X<W v )Cr+lOU>, x) 


2X,V; 


dx 




remembering that the leading term in h(w, x) is xV2X<u\-. 
Hence from (10) wc find 


(14) 


0 


_ 1 

Vsx^ 2 


exp 


, hl(w, x)\ dh r (w, x ) 

5 2X,v 2 J dx 

1 _ -u 3 fac r4 .iCo -2 , s) 
+ VSxx e S— 


*LCr4-i(o J "* > x) j 


i.e. 


(15) 


V (,-}*« c r+ i(o- 2 , x) \ 
d* l A/2Xi<r 2 / 


+ © exp 


/_i tf(to, s) \ 
l J 


_1__d/t r (iu, x) 

Vsxx ax — 




Given 7i r we can therefore proceed directly to c r+1 and hence to /i. p +i . 

3. Application to give terms in 1 ffi. It will be sufficient illustration of the 
method, if we show here how to obtain hi from h 0 . We have from (15) 


(10 

i.e. 

(17) 


— L~ iz> C W) \ 
VS hPJ 



X 1 (2Xv«0\ . / (s \m) = 

2 (SXV)J r (SX.ff'0 


— I e -w Mgjjg) 
Vsx*?< 


+ 


(gxjgj/A) 

(SXV,) 2 


d ? exp 



Vu = 0 


where d now denotes differentiation with respect to it and subsequent equation 
to unity 


i e, 

(18) 

(19) 

whence 


a / -fcr* Ci(o- 2 , *) \ _ (SX i<TiJfi) 1 -JjS/1 , 0 2 4\ 

£r vi (+ } 

— ■*- ( S XiV<//*) / -4* 1 / i j.aA 

4 -^ra 4 e {x+x) j 


( 20 ) 


Ci((T 2 , x) = X 



(1 + V (sxVV/,) 

4 (SXv^) 2 
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Hence to the terms in 1 //, we have 


( 21 ) 


V - V = h(s\ x) 



(1 + X 2 ) (sxV,//>) ~ 

4 (SX,s 2 ) 2 _ 


Solving this out for x we obtain, to the same order 


( 22 ) 


x 


(1 + V*) (SX 2 st//,) " 
4 (2X,s 2 ) 2 _ 


Avhere v equals (y — ri)/ . To order 1 //, we may regard x as aunit normal 
deviate and hence determine the probability level corresponding to the observed 
ratio v. On the other hand if we wish to determine the value of y — ij Avhich 
Avill lie on a given percentage level the expression (21) is the appropriate one 
to use. 

4. Further discussion. The present development is of course basically 
equivalent to that given m the preAdous paper. Indeed if Ave integrate (10) or 
(15) out Avith respect to x Ave arrive immediately at the formulae which Avere then 
obtained and Avhich ivcre illustrated by calculating terms to order l// 2 . In 
fact Aidien calculating higher order terms it seems best to do this integration 
before carrying out the operation 0. The object of the present note is really to 
stress the fact that avc arc simply finding a function of the observations and of 
y — i) AA'liich is distributed as a unit normal deviate, Avhatever the values the 
true <r 2 may chance to possess 

Finally, the remarks folloiving equation (7) above should be someivhat ampli¬ 
fied. The equation asserts that the distribution of any arbitrary function x, 
defined by (3), is 


(23) 


P(X) J\V I • V / 27rSX 1 (7 2 l 


exp - - 


1 h 2 (s 2 , ic) \ Phis' 1 , x) 
SXiu 2 / dx 


II (p(s 2 ) ds\), 


where his , x) is the function obtained by solving out (3) for y — 17 . On carrying 
out the integrations m (23) we shall m general obtain p{x) as a function of x and 
<x\ . Our argument is that if h be chosen properly the a\ Avill disappear from 
fix), and x will appear only in the form of the unit normal probability function. 

To find h{s , x) by a direct process of numerical integration would appear to 
involve in the first instance the choice of a net-work of points for x and s“, . 
Suppose the range of x is covered by n x points and the range of s, by n, points. 
We may then as an approximation look on our task as that of finding the (n* 7 r,n,) 
values of his 1 , x) corresponding to this network. Since (23) is to be true for all x 
and a\ , we can take in turn m values of cr 2 , and then (23) can be replaced by 
(n x irpii) simultaneous equations (it would be necessary to use some formula 
.expressing dh(s\ x)/dx m terms of values of h(s a , x) at discrete values of x or 
conceivably this may be avoided if we work with the integrated form). With 
a proper choice of the points for x, s\, and <r\ , we might expect to evaluate the 
senes his 2 , x) to any required degree of accuracy, but clearly as a general process 
to be used over a whole range of values / t this approach Avould be too laborious. 
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It may indeed bo queried whether theoretically, with an indefinitely fine 
network of points, we shall he led to a unique function h(.<s 2 , x) with the common 
sense properties, which, from general statistical considerations, we know it 
should hare in order to be acceptable. As with integral equations of a simpler 
character, the passage from a discrete network to a continuum may raise prob¬ 
lems, but it is the author’s opinion that the infinite ranges of x and s 2 ( give us the 
freedom which wo require in the solution. 

The author, however, prefers to approach the problem from the numerical 
behavior of the series, of which (15) gives the general terms. Here the practical 
issue appears to be to investigate the relation between the magnitude of the last 
term retained and the /,. The author hopes in a further paper to give some 
results of an investigation of this character and also some tables facilitating the 
calculation of /i(s 2 , x). 
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PROBABILITY[SCHEMES WITH CONTAGION IN SPACE AND TIME 1 

By F/fiLix Cernuschi* and Louis Castaonetto 
Harvard University 

1. Summary. In lnany’natural assemblies of elements, the probability of 
an event for a given element depends not only on the intrinsic nature, of that 
particular element, but also on the states of some or all of the rest of the elements 
belonging to the same assembly, On the basis of this general idea of “contagion” 
some urn schemes are developed in this paper in which one. has contagious 
influence in space and time. The most interesting result found is that in general 
the points of convergence of the probability of the assembly arc given by some 
of the roots of an equation p = /(p) and that some of these roots, between zero 
and one, represent stable states of the assembly, or points of convergence, and 
others represent unstable ones, or points of divergence. The two neighboring 
roots, (if they are single), of a root representing a point of convergence are un¬ 
stable values of the probability. Consequently, under certain conditions, the 
limiting probability may be made to have a finite jump by changing the initial 
probability by an arbitrarily small amount, The concrete cases developed in 
this paper can be considerably extended by similar methods by assuming more 
complicated and general assemblies and laws of contagion. 

1 On the suggestion of I,ho referee, some parts of the original paper were deloted and 
some mathematical simpliGcations were introduced, 

2 Research Associate at Harvard Astronomical Observatory and Guggenheim Fellow. 
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2. Introduction, In the known probability schemes of contagion of Eggen- 
berger and Polya [1], Greenwood and Yule [2], Luders [3],Neyman [4], Feller [5] 
and others [6], as well as in Markoff chains different ways are considered in 
which the pievious results in a definite series of trials may influence the proba¬ 
bilities of the future ones. All of these schemes consider possible influences of 
the results of the different trials along the tifne axis, and consequently might 
be called schemes of contagion in one dimension and one direction. 

In many natural assemblies of individuals or elements, the probability of an 
event per individual or element depends not only on the intrinsic nature of the 
considered clement but also on the states of the rest of the elements belonging 
to the same assembly. 

The purpose of this paper is to develop some simple schemes with urns in 
which there is a contagious influence in space and time and to show some of their 
consequences. The method which we have used to treat certain concrete cases 
could be applied to more complicated assemblies and laws of influence in space 
and time. 


3. Scheme of a closed assembly of turns in two dimensions. Let us consider 
a set of N urns arranged on a closed surface in such a way that each one of them 
is surrounded by m others. Let each uni contain a finite number of black and 
white balls. In this paper the probability associated with an urn will refer to 
the probability of obtaining a white ball if a single ball is drawn at random from 
the urn. We shall assume that the initial probabilities are equal for all of the 
urns and that the following law of influence holds: When, after a collective 
trial, one finds that the ball drawn from a certain arbitrary urn, taken as the 
central one, is white and that the corresponding results of the m surrounding 
urns give l white and s black balls, one multiplies the probability of obtaining a 
white ball out of the central urn by the factor a£,i ai, 2 ; if the ball drawn from the 
central urn were black, without changing the given results of the surrounding 
urns, one multiplies the considered probability by the factor al.tal, i. Under 
the specified conditions, it is easily seen that the probability of obtaining a white 
ball from a definite urn at the i + 1 tnal will be, by considering all the possible 
alternatives: 


p t+ i = ml 2 T jj -^ I [p*(pi a ll2 y -f p, <p(p, ct 2 fi y] 

(1) j-o — 0 )' 

— — p\(Pi a l,l + !Z* <*1,2)"* + Pi (Z>(pv « 2,1 + “2,2)’", 


where: 


Pi + “ 1. 

Consequently p ( either converges to a root of the equation p = f(p) or tends to 
infinity. As a probability greater than one or smaller than zero has no meaning, 
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we have to study the function y = /(p) between zero and one. In (1) we have 
given an implicit form for y = /(p), corresponding to a particular ease of influ¬ 
ence; by changing the law of influence we change the function/(p). In general 
one can find graphically the roots of equation p = f(p) by plotting y = f(p) and 
y = p and by determining the intersections of these two lines in the range 
0 < p < 1. Later we shall give the values of these roots for some concrete 
examples. From what we have shown it follows that if, for the considered 
assembly of urns and for especially chosen values of the parameters of inter¬ 
connection and initial probabilities, the probability tends to some equilibrium 
value, this must be a root of the equation p = /(p). As we shall see later, the 
roots in the range 0 < p < 1 may represent stable or unstable states of the 
assembly. 

Let us consider now a general method for finding the explicit form of the 
function/(p) corresponding to laws of influence similar to the one used by Polya. 

Assume that the trial i results in the drawing of l white balls and s black balls 
from the in urns surrounding the central one. Then we add Iwt white and 
sJ h black balls to the central urn if the result of the central urn was white, and 
hot white and sh black balls if it was black. It is easy to show that under these 
conditions the probability in tho trial i + 1 is related to the. probability in 
trial i by the following foimula: 


( 2 ) 


P, +1 = Pi [' t‘ v ' [h -- ti ' (p, t\ n + <S‘) M ] 


dt 


+ (i - po [ 1 -- 1 

Jo 


L^l~^^ t( p l <l! ,, + <jAT 


1 dt 

JU-la-l 


wliero W, and iV, are the number of white balls and the total number of balls, 
lcspectively, in the central urn before trial i. Relation (2) permits us to study 
several interesting schemes. It is easy to see that all the possible schemes which 
can be represented by relations of type (2) give only values of the probability 
in the interval zero and one; and consequently we do not need to make the 
restriction in the analysis of the equation p = /(p) that urns necessary in the 
previous scheme, represented by equation (1). 

For the case w L = h = ci, w* = bi = c 2 , we obtain from (2) 


(3) 


Pi+l = Vx 


rV i -t- mp, c i 
N t -f met 


+ (1 — Pi) - 


Ni + met 


If = c 2 , (3) givos 

(4) Pi h = P.. 

If one takes c x = kiN, and c<, = I^Ni (3) becomes 


(5) 


Pt+i = p. 


p, + mkip 
1 + mki 


- + (1 - Pi) 


Pi + m?c 2 p , 

1 + inlet 


f(p>) 


and the equation p = /(p) has, in this case, the roots 0 and 1. 
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When w i = h 2 — kiNi and hi = iv 2 = Tc^TV,, one lias to replace h(d/dfi) hy 
k(d/dh) in the second term of (2); then if we take m = 2, 


(6) Pv+i = 


Vi + 2fe t 
1 + 2fcj 


+ p t q, 


(r 


Pi ■+■ fa 


4- 27c 2 


1 -f- /ci 4" 7c 2 


- 3 


4- 2fci 

4- 2/ci 




/fp,). 


In particular, if /ci = 7c 2 = 7c, one obtains 


(7) Pi+i = ^ [47cp 2 < - (4fe - l)p, 4- 27c] = /(p,), 

and the solutions of the equation p = /(p) arc p — \ and 1. By considering 
the behavior of y = f(p) one finds that the stable solution is given by the root 
consequently if one starts with any value of 0 < p < 1 the probability tends 
to the limiting value 7? If 7cj. = 0, fa ^ 0, by simple calculations, one obtains 
from (6) that the solutions of p = /(p), in this case, are zero and one 
The equation p = /(p), as given by (6), always has the solution 1. In order 
to have the other two roots real, one has to satisfy: 

fa(l 4* 2 fa) (2 4- /ci 4- 3 ItiY > 4(1 + 7ci 4- /c 2 ) 

[(fci + fa) 2 4- 2(7ci — fa) — 4 fa]. 


A simple and interesting application of relation (2) is for the case of two urns, 
characterized by m = 1. Prom (2) we obtain: 


(9) P.+i 
where 




4- + 1 - 


+ 7v, 1 1 + fa 


0- 


Kix) 


wi = kiN,, hi — kiNi , Wi = faTV,, fa = k t N\ . 

The equation p = f(p), as given by (9) has-the roots 0 and 1; and one may fix 
"the value of the third root by conveniently choosing the values of the parameters. 

Applying (2) for an arbitrary value of m and integrating by parts, it is seen 
that in general the equation p = /(p) is of degree m 4* 2 and consequently, hy 
choosing appropriate values for the parameters fa , fa , 7c 3 , k t , each of which may 
be between —1 and °°, one can expect several roots in the range 0 < p < 1, 
One can easily generalize our relation (2) for cases in which vh , w%, bi , fa are 
given functions of the probability p,. Even in this most general case it is simple 
to see that ono would have a recursion formula of the type p i+1 = f(p.-) and, as 
in the elementary cases which we have considered, the points of equilibrium of 
the closed assembly of urns will be given by those solutions, in the range 0 <, p 
< 1, of the equation p = /(p), where the derivative of y — /(p) is negativo. 
Consequently the two neighboring roots, if they are single, of a root representing a 
point of convergence are unstable values of the probability. Therefore, under 
certain conditions, the limiting probability may take a finite jump if the initial 
probability is changed by an arbitrarily small amount. This is, we think, the 
most important consequence of the contagion schemes that we propose. We 
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consider that many actual cases of contagion could be better understood by 
schemes of the type that we are studying. 

Let us consider now some simple eases of relation (1). If we take 


ai.i = « 2,2 = m m , 2 = « 2 ,i = and in = 2, 


representing a closed ring of urns, one obtains: 


( 10 ) 


Phi = + P.qfatP, + oaq,) 2 

= v\ + (Pi ~ P‘d [(«i + « 2 ) a — 4 oq] = f('pi). 


The equation p = /(p), corresponding to this recursion formula, always has the 
solution p = 0. The other two solutions are given by 


(U) 


Pi, 2 = i 


1 rb 


a/jEj 

V 4ai - 


~ (ai + ati ) 2 


(on + aa) s J 


These roots will be between 0 and 1 when 


( 12 ) 


2 < £*i -+■ ai 

or (12') 

1 > aa < a 2 


We would have Pi > 0 and P 2 < 0 if 


(13) 


2 < cm + at 
1 < ai < a 2 


or (130 


and Pi = P 5 when 


2 > m + a 2 

1 < ai > a 2 

2 > ai + a 2 

1 > ai > a 2 , 


(14=) = 2, ai 1. 

Let us now study the general behavior of (10). For the conditions (120 we 
have: 


( 15 ) Pi +1 ~ Vi - a 2 p,(p< ~ Pi) (Vi - Pt) 

where d 1 = 4 «? - (ai + a 2 ) 2 > 0. 


If 0 < Pi < Pi, one obtains from (15) by use of elementary algebra: 


(16) 


Phi ~ Vi 


Pi - pi 


I (Pi ~Vi)\<~< 1. 


Consequently if pi > P 2 the sequence p< increases monotonically. Otherwise 
Phi will lie between Pi and p< and will tend to Pi without ever reaching the other 
side of this point. In a similar' way it is possible to prove the convergence to a 
constant for the most general equations of the type p = /(p) when they have 
roots between zero and one. 

Let us give some numerical results. For a-i = 0.95 and a 2 = 1.1, from (10) 
one obtains: P 2 = 0.1 and P 2 = 0.9. It is easily seen that, in this case, if 
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0 < pi < 0.1, the limiting value of p; will be zero, if pi > 0.1, the limiting value 
will be 0 9 The interesting jioint is that if the initial probability is in the 
neighborhood of 0.1, an infinitesimal change in its value may produce a finite 
change in the stable limiting probabilities; and that for the initial probability 
equal 0.1 one would have an unstable equilibrium of the system. This con¬ 
sideration shows ivhy it is important to know how the probability p, converges 
towards a certain point, As we have previously shown, the points of con¬ 
vergence are roots of the cq. p = /(p) but there roots which are not points of 
convergence. 

Similar reasoning could be applied to more complicated systems belonging to 
our general scheme of contagion Consequently, the most important result is 
not that the considered assembly may have a probability tending to some value 
in the range 0 < p < 1, but that under certain conditions the limiting probability 
may jump from one value to another by changing the initial probability by an 
arbitrarily small amount. 
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PITTING CURVES WITH ZERO OR INFINITE END POINTS 

By Edmund Pinney 
Oregon State College 

The problem of determining a suitable equation to fit an empirically deter¬ 
mined curve over a given interval has been of great importance in statistical 
work, in expeiimental science, and in engineering technology Since infinitely 
many types of equations may be made to fit the data with required accuracy, 
the choice of a “suitable” type of equation depends on the qualitative nature 
of the empirical curve, on the use to which the equation is to be put, and upon 
considerations of simplicity. 

As a function type, the polynomial has, because of its simplicity, been enor¬ 
mously useful. The function type studied here is a little more general than the 
polynomial type, being particularly useful in the case of empirical curves that 
become zero or infinity at one or both ends of the interval. 

Without loss of generality the interval in which the equation is to lit the curve 
may be taken as 0 < x < 1 It is assumed that, by numerical means or other¬ 
wise, a finite set of moment p. m = / yx m dx may be computed, y being the 

Jo 

ordinate of the empirical curve 
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The problem to be considered here is that of determining a function f(x) of 
the form 

(1) f(x) = x a (l - xf a p x p , R(a) > — 1, RQ 3) > — l 


such that 

( 2 ) 



X m dX ~ Hm 


as m ranges from zero to the number of the highest moment known, f(x) is 
then an approximation to y which may be written 

(3) V^f(x). 


Theorem 1°. Given a finite set of moments mo > Mi, rti, • • * , n» , and given that 
R(a ) > — 1, R(p) > —1, define 


(4) S P (pt, j3) 


_r(p + «_+i) / P \r(m + ?+_« + £ + D v » 

r(p + a + P + 1) 0 \m/ r(jn+a+i‘) V ^ 


n M - 
Oh — 


(5) 


hL 

feir(“fc + « + i) 


V (2p + a + 0 + 1)T(P + k + a + 0 + 1) u M 
' h ~ (V - *)tr(p + 13 + i) ’ §}> 


(6) m = a*"(l - 

0 


Thmf(x) will satisfy (2) /or m = 0,1, • • • , n. 

2°. //, in addition to 1°, y„+i is known and a and (3 satisfy 


(7) 


&+i(«, /?) - 0, 


thenf(x) will satisfy (2) /or* m = n + 1 also. 

3°. If, in addition to 1° and 2°, y„ +2 is also known, and if a, 0 also satisfy 

(8) &.*(«, j8) = 0, 

then /( x) will satisfy (2) for m — n + 2 as well. 

Proof. Let s) bo the Jacobi polynomial of order m defined in termB 
of the hypcrgcomctric function by 

(9) F(-m, f» + a + /3 + l;« + lji - ^). 

Let — 2p) symbolically represent the expression gotten by substituting 

Mr for a:* in the expansion of the polynomial P, ( “ ,w (1 — 2a:). There exist numbers 
A m ,q such that 

m 

*" = - 2®). 

0 


(10) 
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Mm = Z f 4. t4 P‘‘'*(l - 2 M ). 


ForF(a) > — 1,P(/3) > —1, define 

f/„\ _ „«/i _ y* (2p + a + + l)plT(p + a + /3 + 1) 

(12) fU C } 0 p r(p + a + l)r(p + 18 + 1) 

X P? M (1 - 2 M )P ( “ lfi) (l - 2*). 

Then by (10), for m = 0,1, ■ • • , n, 

(W*. t, » + r ( P + 4 + + u C+;+ 1 ) +1} f - w 

m /il 

x Z* -Am., *“(1 - xfP^ M ( 1 ~ 2x)Pp 0,W (l - 2x) dx. 

o Jo 

By the orthogonality of the Jacobi polynomials, [1; §4.3], 

ff(x)x m dx = Zp^P^tt - 2 m). 

Jo 0 

By (11), 

f f(x)x m dx = Mm, (w = 0,1, • • ■ , n). 

Jo 

It follows from (2) that/(a;) as defined in (12) is the J(x) of (1). Itremainsto be 
shown that (12) may be expressed in the form (4)-(6). 

From (9), 

- 2x) = r(p + a + l) 

( 13 ) P r(p + a + l3+l) 

^ ^ (- ) m r(p + m + a + /3 + 1 ) 

A ™ I I I I 1 1 \ * ) 


o m ml (p — m) ! r(m + a + 1) 


so by (4), 


Pi a ' $) (1 - 2 m) = SM 0). 

V- 


Inserting (13) and (14) into (12), 

j-/ \ o/i _\fl \h 2p + a + 0 + 1 

f(x) - x (1 *) Z» r(p + p + 1} 


(~')* r(p + A + a + 0 + 1) k 


A o /cl(p ~ &)! r(* + a + 1) 


x s Sp(«, /3) 


n / _ \ k ~ k 

= s“(l - xY E* i .,=A i- - JT f 
o fc!r(fc + a + 1 


V fcir(fc + a + 1 

W V- (2p + a + 13 + l)r(p + & + a + /3 + l) 
» (p - fc)ir(p + /s + l) 


(Sp(a, (3), 
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= x"(l - xf £* ofV, 
0 


by (5), so the fix) of (12) may be expressed in the form (4)-(6), and part 1° of 
the theorem is established. 

If (7) holds, by (5), a£ n+L> = ajf' 1 for k = 0,1, ■ • • , n, and a = 0. There¬ 
fore, in (6), 


Tt+1 

fix) = x a (l - xf % arv, 
0 


and by part 1°, for the case in which n is replaced by n + 1, it follows that (2) 
holds for m = n + 1, so part 2° is established. The establishment of part 3“ 
is essentially the same. 

In applying this theorem to the problem of empirical curve fitting, it follows 
from (6) that the constants a and /9 should differ from zero only if the empirical 
curve approaches zero or infinity at one or both of its endpoints. With this 
in mind the following rules may be stated: 

Ca.se A. If, in the empirical curve, /(0) ^ 0 or ®, and/(l) ^ 0 or », set 
a = 0 = 0, and let n be one less than the number of moments that it is desired 
to fit. 

Case B If /(0) = 0 or « and/(l) 0 or «>, set f) = 0 and determine a from 
(7), n being two less than the number of moments that it is desired to fit. 

Case C. If /(0) ^ 0 or °° and /(l) = 0 or <», set a = 0 and determine /3 
from (7), n being two less than the number of moments that it is desired to fit, 

Case D. If /(0) = 0 or « and/(l) = 0 or «>, determine both a and /9 from 
the two equations (7) and (8), n being three less than the number of moments 
that it is desired to fit. 

It may happen that these processes cannot be carried out, or at least cannot be 
conveniently carried out. If this is the case, a or @ may be set arbitrarily and n 
taken as one umt higher than before, or both a and (3 may be set, and n taken 
as two units higher than before. 

In Case D, above, the solution of equations (7) and (8) may often prove 
difficult, making it advisable to follow the suggestions of the last paragraph. 
In certain special cases, however, their solution iB not difficult. 

Suppose, for example, the moments satisfied the equations 



If this is substituted into (4), and the order of summation reversed, on making 
use of the identity 

fie') T ( n \ T ^ V it c ^ - /_y» r(a)r(a - v + l) 

V p W rfa + v) K j r(« - v - n + i)r(» + v) ’ 


one obtains 

(17) 


S P (a, d) = (—) P & P 03, a). 
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Therefore 

(18) & p+ i(a, a) = 0. 

When n is an integer, either n + 1 or n + 2 is odd. Therefore when (15) 
holds, one of either (7) or (8) will be satisfied identically if we take /3 = a. The 
other may then be solved for a. 

As an example, suppose one had the moments mo = 1, mi = 2 , M 2 = tt, Ms — 

M4 = -£hs, and wished to obtain an f(x) such that /(0) = 0, /(1) = 0. In this 
case n = 2, and (15) is satisfied. It follows that (7) is satisfied identically when 
(9 = a, and (8) gives 

r(2« + 5) L „ r(2<* + 6) f_l\ , r(2a +7)(7\ 
r(« + 1) ^ r(« + 2) \ 2 ) ^ r(« + 3) \2iJ 

. 4 r(2 a + 8) /_ J_\ r(2to + 9) / 3l\ 

r (a + 4) V 16/ r(a + 5) \240/ 

This easily reduces to 

i - 4 ffl + 5 / 2 _l 7 (« + 5/2X“ + 3) 

a + 1 (a + 1)(<* + 2) 

A (« + 6/2) (a + 7/2) , 31 (a + 5/2) (a + 7/2) _ A 

(a + l)(a + 2) 7" 240 (« + 1)(« + 2) 

which reduces to the quadratic 

4a a - 6a + 5 = 0, 

from which 

(19) a = fi = 3/4 ± (1/4) VII*. 

These may be substituted into (4)-(6) to complete the solution, 
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CONSISTENCY OF SEQUENTIAL BINOMIAL ESTIMATES 

By J. Wolfowitz 
Columbia University 

The notion of consistency of an estimate, introduced by R. A. Fisher, applies 
to a sequence of estimates which converge stochastically, with boundlessly 
increasing sample size, to the parameter (or parameters) being estimated. Each 
estimate is a function of a sample of observations, the number in each sample 
being determined independently of the observations themselves. In sequential 
estimation, on the other hand, the number of observations is itself a chance 
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variable, determined by the sequence of observations and the application to 
them of’ a rule which may be part of a sequential test. In what follows we 
shall consider that the operation of sequential estimation is associated with a 
sequential test. 1 

The advantage of using consistent estimates is such as to suggest extension 
of the idea of consistency to sequential estimation. In the present paper we 
shall be concerned only with the estimation of a binomial probability (p say). 
The obvious extension is that a sequence of estimates, each with its associated 
test, is consistent if the estimates converge stochastically to p. 

Since the number of observations required by a sequential test is a chance 
variable, a parallel to the classical sequence of samples of increasing size would 
be a sequence of sequential tests whose average (in some sense) sample sizes 
increase without limit. It seems reasonable to associate only such a sequence 
of estimates with this sequence of tests as will converge stochastically to p, 
i.e., be consistent. 

Let z be a chance variable which takes the distinct values c t and Cj with proba¬ 
bilities p, 0 < p < 1, and q = 1 — p, respectively. Let Zi, • • • , z n be a sequence 
of independent observations on z which terminates with the nth according to the 
specific sequential test under consideration. Denote by x and y, respectively, 
the number of observations c* and ci in this sequence. Then x, y and n = x + y 
are all chance variables. The couple g = (x, y) is called a boundary point of 
index n (see [1]). The sequence of observations which terminates at g is called a 
path. Let k(g) denote the number of paths which terminate at g, and let k*(g) 
denote the number of these paths whose first observation is c t . The “points” 
on the various paths together with all the points g constitute the “region” under 
discussion. 

Let P{n = j} denote the probability of the relation in braces. If 

- Ep{» = i} = i, 

7-1 

the region is called closed. Only closed regions will be considered below, so that 
this assumption will henceforth be made without explicit formulation. It has 
been shown by Girshick, Mosteller, and Savage [1], that p(g) — k*{g)/k(g) 
is an unbiased estimate of p for any closed region R, i.e., 

2 p{g)k{g)p v Q* a p, 

where the summation takes place over all the boundary points g of R. For 
many important regions this estimate is the unique unbiased estimate. 

Let there be given an infinite sequence of sequential tests with each of which 
we associate the estimate p{g ). Consider the ith one of these, and let no; bo 
the smallest number of observations required for a decision, i.e,, no< is tho smallest 

5 

1 Really all that is required ia a rule for terminating the observations such that its region 
R is closed (see below). However, we defer to conventional statistical usage in referring 
to "tests ” 
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value of 3 for which P{n = j] 4= 0 The theorem proved below asserts that if 
no, approaches infinity with i the estimate p(g) converges stochastically to p. 
To put it m other words • if Ti , T t , ■ ■ ■ is the sequence of tests, and «i and t 2 
are arbitrarily small positive numbers, there exists a positive number J(ei, e 2 ) 
such that, for all T x such that i > /, 

P{\ p(a) ~ V I > <*} < 

when n 0t —» . An important example of such a sequence is that of the Wald 
sequential binomial tests [2] obtained as follows: Let cq, a*, * • , a, ■ • and 
/A , /3o, • ■ , jS, • , be two sequences of positive numbers all of which are less 

than \ and which approach zero as i Let p 0 and pi, 0 < po < pi < 1, 

be two fixed numbers, 

Ci = log J, ca = log 8-—?, = E 

Pd (1 ~ Pd) l-l 

Finally let the rule for terminating the process of drawing observations be as 
follows for the zth test Ti : The process of drawing observations terminates at 
the smallest integer to for which either 

> log l -^-^ or Z„ < log . 

ot x 1 — a t 

Since (1 — /3,)/a, —> eo and /3,/(l — a.) —> 0 while Ci and c 2 are constant, it is 
evident that the hypothesis of the theorem is satisfied. 

The property of being unbiased is not geneially considered an indispensable 
characteristic of an optimum estimate, while consistency is generally so regarded. 
Our theorem shows that p{g) enjoys the latter property with respect to important 
sequences of sequential tests. 

Theorem: Let T \, • ■ , T x , • • • be a sequence of sequential binomial tests. 
For the ith test 2\ let no, be the smallest integei such that P{n = no,} =)= 0. Finally 
let Ho, —* 00 as i —> <*>. Then p{g) converges stochastically to p as i —* oo , 
Proof: For typographic simplicity we shall use n 0 as the designation of the 
generic element of the sequence n 0 i, n 02 , ••• . No confusion will be caused 
thereby. 

Let n' = n 0 — 1, and Si > 0 and S 2 > 0 be arbitrarily small fixed numbers. 
Let k'(g) be the number of paths which end at the point g and are such that 
| y'/n’ — p | < Si, where y 1 is the number of observations Ci among the first to' 
observations. We then have 
Lemma 1. For n a sufficiently large 

(i) E Y >i-s, 

otii 

where B is the set of boundary points of R. 

Proof: Consider the totality {h} of all points h = ( x ', y'), with x' + y' = n'. 
Here x' and y' denote, respectively, the number of observations c 2 and ci in the 
sequence of the first n' observations on z. Let kffff) denote the number of paths 
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to h. Let C denote the set of points h such that | y'/n' - p | < . If n 0 is 

large enough we have, by the law of large numbers, 

2 fa(h)p v 'q x ' > 1 - 5 S ■ 

htC 

Let k(h, g) be the number of paths from h to g. From Theorem 2' of [3] it 
follows that 

(A) X) H h > g)p ¥ q x = p v 'q*'- 

Also from the definitions of the various symbols involved it readily follows that 

k'(g) => 2 ko(h)k(h, g). 


Hence 

X k'(g)pV = 2 (X Hh)lc(h, g))f<f = 2 (2 UVHh, g)v Y) 

„,B htC l,,C 

= 2 Hh)(L KK g)p v q*) = 2 h(h)p v 'q x ' > 1 ~ s a . 

hlO *‘ c 

This proves Lemma 1. 

Let £(g) = [fc(p) - Thus £(ff) iB a chaneo variable, being a function 

of the chance point g. 

Lemma 2. Let S> and 5 t be arbitrarily small positive numbers, For n 0 sufficiently 
large 

( 2 ) pm £ «») > 1 - «*• 

Proof: If (2) were not true, we would have 

( 3 ) E ^ = 2 k'{g)p\ x < (1 - 5*) + (1 - § 3 )5< - 1 - 

Choose the h of Lemma 1 so that h < ■ For some large value of no we 

would then have a contradiction between (1) and (3). This proves the lemma. 

Let g be any boundary point. Consider any path whose y' is such that 
| y'/n' — p | < <5i; let us call such a path one of type T. Consider the terminal 
sequence S of this path, 

L : 2ng , , ' * * i %n 

This sequence, together with g = (x, y) , uniquely determines y'. Any permuta¬ 
tion of y 1 elements ci and n’ — y 1 = x' elements cj may sorve as the initial sequence 
of n! observations of a path which terminates at g and has the terminal sequence 
S. For no boundary point is of index smaller than n 0 , so that under permuta¬ 
tion of the first n' observations a path remains a path, i.e,, the process of taking 
observations will not terminate prematurely as a result of the permuting of the 
elements, Of these permutations a proportion y'/n' begin with the element ci. 
We deal in this manner with all the different terminal sequences of the paths of 
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type T which, end at g. Let k*'(g) be the number of these which begin with ci. 
We obtain 

Lemma 3. For all g such that k'(g) =fc 0 


k*'{g) 

k\g) 



< S 1 . 


Putting Lemmas 2 and 3 together we have 

Lemma 4 4sn 0 -» k*'(g) /k{g) converges stochastically to p. 

Now it follows in a manner similar to that of Lemma 2 that, as n 0 —> °°, 
k*'(g)/k*(g) converges stochastically to one. This, together with Lemma 4, 
proves the theorem. 
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book reviews 

Mathematical Methods of Statistics. Ilarald Cramtr. Uppsala, Sweden: 
Almqvist and Wiksell, 1945. pp. xvi, 575. (Princeton, X. J.: Princeton 
University Press, 1946 $6.00) 

Reviewed dy Winn FnnnEu 


Cornell University 


This booh; represents a contribution of a novel kind to the statistical literature 
and will render valuable services both as textbook and reference book. Of its 
three parts the first one (134 pages) is entitled Mathematical Introduction and 
develops the necessary formal mathematical tools. The second part. (180 pages) 
is devoted to Random Variables and Probability Distributions, that is to say, to a 
chapter of the modem theory of probability, The third, and main, part of the 
book (some 233 pages) is entitled Statistical Inference. Ordinarily those 1 three 


topics would require consultation of three or more books, and these would rarely 
be found on the same shelf. However, the masterly exposition succeeds in creat¬ 
ing the impression of natural unity and harmony. The ideas arc* tie vein j let 1 with 
elegance and apparent ease as if the line of presentation followed a well explored 
path The uninitiated will not notice how unconventional the treatment is and 
how the very selection of topics depends on the author’s scientific personality. 

It is hardly necessary to point out that Cram6r’s book fills an urgent need. 
The emergence of statistical theory and methodology as an exact science, firmly 
grounded in mathematical probability, is only of recent date. Its rapid develop¬ 
ment went hand in hand with an extraordinary increase of the. number and im¬ 
portance of its various applications. Under such circumstances them was 
naturally little time for an exposition of the theoretical foundations and ramifi¬ 


cations Modern statistical inference has its roots in the classical limit theo¬ 
rems of probability. Now classical probability used to consist of a bewildering 
collection of special and mutually uncorrelated problems; unified guiding princi¬ 
ples and methods are a rather new development and have not yet found expression 
in the textbook literature. The original investigations are usually written in an 
exceedingly abstract language and the existing close ties to applications are not 
apparent. Consequently, there is no easy access either to probability or statin- 
/ , ana ^ 1S °^ en difficult to establish whether, or to what extent, various asser¬ 

tions have actually been proved. The present book therefore closes a serious 
gap in the literature and will greatly facilitate both teaching and research, 
f the 12 chapters of the Mathematical Introduction 9 arc devoted to the theory 
of measure and integration. The antiquated theory of the so-called Riemann 
m egial (kept alive by elementary textbooks) considered only point functions 
y - f(x), where the independent variable is a point, The temperature at a 

SI+W? T the , velocity at a S iven moment are typical examples. Many 
mathematical considerations simplify greatly if from the very beginning also set 
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functions ij = i' I (A) arc introduced, where the independent variable is a set. 
Typical examples are mass in mechanics, the amount of heat or of electricity, 
area or wealth of a geographic region, and the probability of events (1 e. sets m 
sample space). The Lebesgue-Stieltjes theory frees the concept of integral from 
artificial devices and reduces it to the natural notion of mean values with respect 
to set functions. In a simile, believed to be due to Lebesgue, the Riemann inte¬ 
gral corresponds to the procedure of a grocer who computes the day's receipts 
by actually adding the several amounts in the order as they had come m The 
Lebesgue procedure imitates the more intelligent grocer who orders his cash in 
piles of notes and coins of equal denomination and counts them. The analogy 
with the customary procedure of computing mathematical expectation is clear. 
The Lebesgue-Stieltjes integral is conceptually simpler than the Riemann integral 
and can be presented in as simple a way with rigor adequate for elementary text¬ 
books It has become an indispensable tool in probability, statistics, physics, 
and other applied fields Since it has, unfortunately, not found its way into 
calculus textbooks, physicists are compelled to use the less flexible notion of the 
Dirac 5-function, and the formal mathematical appaiatus in general becomes 
unnecessarily clumsy. If is a curious anomaly that so many calculus textbooks 
profess to be written with a view to applications and yet completely disregard 
the most obvious practical needs and that the teaching of practical mathematics 
should remain uninfluenced by the great developments of the last fifty years. 

In such circumstances the chapters on integration will be particularly welcome 
to statisticians as probably the only place in the literature where they will find 
easy access to the theory Of course, this exposition leads far beyond what the 
average statistician will require under ordinary cireumsLances and beyond the 
necessary prerequisites of the main body of the book. Of the 88 pages roughly 
half can bo omitted at first reading in accordance with detailed instructions given 
in the Preface. The remaining half will form a valuable reference book for 
theorems and tools used occasionally in connection with more delicate parts of 
statistical theory. The mathematical introduction contains also a chapter on 
Fourier integrals (characteristic functions), one on matrices and quadratic forms, 
and finally miscellaneous complements such as orthogonal polynomials, Euler’s 
summation formula, beta and gamma functions, etc. 

The title to the second part, Random Variables and Probability Distributions, 
is the same as that of the author’s well-known Cambridge Tract of 1937. Both 
start with a discussion of flic foundations along axiomatic lines The new treat¬ 
ment does not differ essentially from the old one, but some changes are intro¬ 
duced which are regrettable in the reviewer’s opimon (in particular axiom 3). 
Otherwise there is practically no overlap between the two expositions. The 1937 
booklet devoted much space to the asymptotic expansions connected with the 
central limit theorem which arc due to the author himself. This topic is not 
touched upon in the present book. This is a judicious procedure since the 1937- 
booklet is generally accessible (although at present sold out) Instead we now 
find a detailed study of some univariate distributions such as xh Student’s t, 
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Fisher’s z, the Pearson system, etc., none of which were mentioned in the Cam¬ 
bridge tract Similarly, there is now a section on correlation and regression, 
and the normal distributions in several variables. The theory of probability is 
developed only to the extent of the formal theory of distribution functions. This 
implies that even so important a notion as stochastic convergence is treated only 
summarily while the strong law of large numbers falls completely outside the 
framework of the book. This is regrettable inasmuch as the strong law is of 
greater importance than the classical weak law (whose fame rests essentially on 
a classical misunderstanding). It should be mentioned that this second part of 
the book contains some 39 well chosen illustrative exercises the solution of which 
is left to the reader. 

In the mam part of the hook, entitled Statistical Inference, the outer form 
changes inasmuch as the text there is accompanied by numerous practical exam¬ 
ples However, the exposition remains mathematical in nature and the main 
emphasis rests on exact formulations; much attention is paid to the establishment 
of the precise conditions of validity of the individual theorems, their logical 
interrelations and their connections with general probability. The expert will 
find many minor and major improvements in formulations and proofs. They are 
too numerous to be listed here. Suffice it to point out, as a typical example, the 
theorem on pp. 426-27 concerning the limiting form of the x s distribution with 
estimated parameters; this theorem appears to bo more general than usually 
stated and also the proof seems to be novel. The topics treated in the statistical 
part of the book will be seen from the following list of titles to the chapters. 25. 
Preliminary Notions on Sampling. 26. Statistical Inference (gonoral orienta¬ 
tion) . 27. Characteristics of Sampling Distributions (moments, somi-invariants, 
corrections for grouping, etc.). 28, Asymptotic Properties of Sampling Dis¬ 
tributions (moments, extreme values, range, etc.). 29. Exact Sampling Distri¬ 
butions (degrees of freedom, Student, Fisher, correlation and regression coeffi¬ 
cients, partial and multiple correlations, generalized variance, etc.). 30. Tests 
of Goodness of Fit and Allied Tests (treating mostly applications of x s ) • 31. 
Tests of Significance for Parameters. 32. Classification of Estimates (sufficient, 
efficient and asymptotically efficient estimates, minimum variance, etc.). 33. 
Methods of Estimation (method of moments, maximum likelihood, x 5 ~m.inimum 
methods) 34 Confidence Regions. 35. General Theory of Testing Statistical 
Hypotheses. 36. Analysis of Variance. 37. Some Regression Problems, Thero 
follow tables of the normal distribution, tho x 2 and the t-distributions, and a long 
list of references. 

If an expression of wishes for a second edition were permitted, most statisti¬ 
cians would probably give first choice to non-paramotric and sequential tests. 
It is needless to point out that the latter became public only after completion of 
the Swedish edition of the present book 

Even this short account will show the extremely wide range of topics and 
theories covered in the book, from abstract integration to randomized experi¬ 
ments. They arc all presented with uniform lucidity. The exposition through- 
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out is formal, and yet inspiring, rigorous and yet never pedantic It will serve 
as an example worthy of imitation and is an achievement on which the author 
deserves our sincere congratulations 


The Advanced Theory of Statistics. Vols. I and II. Maurice G. Kendall. 

London: C. Griffin and Co., Ltd. Vol. I. Second ed revised, 1945; pp. xii, 

457, 50 shillings. Vol. II. 1946; pp viii, 521; 42 shillings. 

Reviewed by M. S. Bartlett 
Cambridge University and The University of North Carolina 

With the recent appearance of the second volume, it is now possible to review 
as one work this comprehensive treatise To quote the author’s opening re¬ 
marks to the Preface to Volume I: “The need for a thorough exposition of the 
theory of statistics has been repeatedly emphasized in recent years. The object 
of this book is to develop a systematic treatment of that theory as it exists at the 
present time." An outline of the contents, which in the two volumes make up 
just on a thousand pages, will indicate that this formidable task has been squarely 
faced by the author, who, when a tentative co-operative venture of writing such 
a treatise was upset by the outbreak of the war, continued alone with the project. 

Volume I contains sixteen chapters. The first six introduce the concept of 
frequency distributions via observational data on groups and aggregates, and 
their mathematical representation (Ch. 1), measures of location and dispersion 
(Ch. 2) and moments and cumulants in general (Ch. 3), characteristic functions 
(Ch. 4), and ending with a description of the standard distribution functions, such 
as the binomial, Poisson, liypcrgeometric and normal distributions, and the 
Pearson and Gram-Charlior systems The next section opens with probability 
(Ch 7) and proceeds to sampling theory (Chs 8-11), including a chapter (Ch. 10) 
on exact sampling distributions, many of the standard sampling distributions 
being used in this chapter to illustrate the mathematical methods available for 
obtaining sampling distributions. Chapter 11 deals with the general sampling 
theory of cumulants, including a useful reference list of formulae and a demon¬ 
stration, due to the author, of the validity of Fisher’s combinatorial rules for 
obtaining these formulas. The section concludes with a chapter on the Chi- 
square distribution and some of its applications. The last four chapters of 
Volume I deal with association and contingency, correlation, including partial 
and multiple correlation, and rank correlation; this last chapter being a compre¬ 
hensive treatment including comparatively recent results of the author. 

It will be convenient to list also the contents of Volume II before any critical 
comment on either volume. The first section of the second volume comprises 
four chapters on the theory of estimation, including a derivation of the properties 
of the maximum likelihood estimate (Oh. 17) and separate chapters on Fisher’s 
theory of fiducial probabihty and Neyman’s theory of confidence intervals. The 
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second main section, according, to the author’s remarks m the preface to Volume 
II, deals with the theory of statistical tests and comprises chapters 21, 23, 24, 26, 
27 and 28, of these after an introductory chapter (Ch. 21) on tests of significance, 
chapters 23 and 24 cover analysis of variance, chapters 2G and 27 give a faiily 
detailed account of the general theory of significance-tests originated by Neyman 
and Pearson and Chapter 28 deals with the recently developed techniques of 
multivariate analysis. The remaining chapters are 22 on regression, 25 on the 
design of sampling enquiries, and Chapters 29 and 30 on time-series, another 
subject in which the author has himself taken an active interest. Finally, there 
are two appendices, A consisting of a few addenda to Volume I, and B an exten¬ 
sive bibliography of theoretical statistical papers 
The volumes are attractively printed; and each chapter concludes with a useful 
collection of examples for the reader 


In any comprehensive treatment of a wide subject there can be no clearly de¬ 
fined order ol presentation; nevertheless, the author’s order of chapters in Volume 
II and in particular his inclusion of analysis of variance among the chapters on the 
theory of statistical tests is a little puzzling, and the reviewer’s preference would 
have been to see this important subject treated earlier, together with regression 
analysis, and their link with the classical method of least squares more firmly 
outlined Incidentally, there appears to be no mention of the Fourier analysis 
of observational data except in its relation to periodogram analysis (Oh. 30 ). 
This change of order would perhaps also have allowed a shift forward of Chapter 
25 on the design of sampling enquiries, and a more compact section on multiple 
correlation, culminating with the chapter on multivariate analysis before the 
chapters on the general theory of statistical inference were begun. 

Another arrangement of rather doubtful value in Volume II is tlie allocation of 
separate chapters to fiducial probability and to the theory of confidence inter¬ 
vals. The problem of how to deal with a field which is still a battleground is 
admittedly not an easy one, and this particular one is an embarrassment at 
present to many teachers, but it may be questioned whether strict impartiality 
is the best answer. To take a hypothetical example, there would seem to be no 
particular virtue in a textbook which expounded, in parallel, statistical methods 
of inference using direct probabilities and the method of “inverse probability” 
leaving the reader to decide at the end which he should adopt. 

The most cnticizable arrangement, however, occurs in Volume I with tlie late 
and rather scanty treatment of probability in Chapter 7 To begin with ex¬ 
amples of statistical data is sound, but since the whole conceptual model erected 

fnr?r ^ f ba f ed on Potability theory, it does not seem sufficient 

for a reader who feels keenly on the subject” to do as tho author suggests in the 
Pieface and read Chapters 7 and 8 after Chapter 1. Even if ho does so, lie will 
find no very clear exposition of the statistical theory of probability -no mention, 
or example, of the laws of large numbers, whether for simple dichotomies or for 
entire continuous distribution functions, that show how the conceptual model 

C ’ <11) ° •' <:omSDOri<, ~ j Wlth the empirical notions of “in the long run” or “for 
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a large enough sample” The actual arrangement, moreover, leads to an ap¬ 
parently rather arbitrary treatment of theorems on limiting distributions; 
the First Limit Theorem, which deals with the equivalence of the limits of dis¬ 
tribution function and corresponding characteristic function sequences, is given 
in the chapter on characteristic functions (Ch 4), and the Central Limit Theo¬ 
rem, dealing with the convergence to normality of a sum of n independent random 
variables, is given in the chapter on probability. 

In the proof of the second part of the First Limit Theorem, dealing with the 
conditions under which a sequence of characteristic functions determine 
the limiting distribution function F (x), the author has not yet corrected an error 
that occurred in Cranter’s original version, which Kendall follows (section 4.12). 
Correct conditions for convergence of the distribution function sequence F n (x) to 
F (x) (at all continuity points of F) are convergence of the characteristic function 
sequence to 4>(l) for all real f, uniformly in at least some finite t interval (cf. H. 
Sehefte, Math. Reviews, Vol 6 (1945), p. 89). 

Another proof in Volume I which appears to need clarification is the geometri¬ 
cal derivation of the distribution of the multiple correlation coefficient in the case 
of a non-zero true correlation (section 15.21). The blunt statement is made, 
following equation (15.51), that the sample correlation coefficient R and an angle 
\p (defined in the text) arc independent, a statement which is incorrect. LIow- 
cver, if the logic of Fisher’s original derivation is examined, it turns out that the 
relation of R and is only required when the true correlation is zero; under such 
conditions R and \p are. independent 

In Volume II there is a sentence requiring correction and amplification in the 
derivation (in the case of zero true canonical correlations) of the sampling canoni¬ 
cal correlation distribution (section 28.30). The sentence “Consider the dis¬ 
tribution for a given value of l l3 and z„ • • • ” should be corrected to read "Con¬ 
sider the distribution for a given value of U, + Some justification that 

the distribution is independent of -f- z„ is then still needed 

There is inevitably, owing to the time the book was written, no mention of 
sequential analysis, the sampling technique developed during the war by Wald 
and others and only recently “derestricted”. Again, in chapter 18, where the 
work of Aitken and Silverstone on unbiased estimates with minimum variance is 
referred to, the simple inequality connecting the variance of any unbiased esti¬ 
mate with Fisher’s information function throws an interesting new light on this 
aspect of the estimation problem (sec, for example, II. Chanter, Mathematical 
Methods of Statistics, section 32 3, or C. R. Rao, Bulletin Calcutta Math, Soc ,, 
Vol. 37 (1945), p. 81), but was not known to the author when this chapter was 
written. Such omissions are merely an indication of the developing nature of the 
subject, and it is hoped they can be remedied in later editions. There is, how¬ 
ever, especially in Volume II, an occasional impression of patchiness in the treat¬ 
ment not altogether excusable on such grounds. This can perhaps be illustrated 
from the last chapter, a valuable contribution to the still-growing subject of 
time-series, but where the importance of some known results does not always 
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seem sufficiently stressed, in particular, the Wiener-Khintchine relation between 
the penodogram and correlogram is noted (section 30.68) as an interesting re¬ 
lation”, whereas it is a fundamental relation in the modern method of approach 
to time-series, giving much deeper insight into the correct interpretation of 
classical periodogram analysis. 

These criticisms, which could be extended to cover minor errors and mis¬ 
prints, are not intended to detract seriously from what is a remarkable achieve¬ 
ment An excellent sense of proportion has been maintained throughout be¬ 
tween mathematical theory and illustrative discussion and examples, Tins makes 
this treatise, if both the breadth and level of the subject matter are taken into 
account, at present unique. It will be an indispensable reference book to every 
teacher and advanced student of the theory of statistics. 


Sequential Analysis of Statistical Data: Applications. Prepared by the 
Statistical Research Group, Columbia University for the Applied Mathe¬ 
matics Panel, National Defense Research Committee, Office of Scientific Re¬ 
search and Development. SRG Report 255, Revised; AMP Report 80.2R, 
Revised New York: Columbia University Press, September 1945. pp. vii, 
17, iv, 80; v, 57; iii, 25; iii, 18, iii, 39; ii, 41. $6.25. (London: Oxford Uni¬ 
versity Press, 1946.) 

Reviewed by John W. Tukey 
Princeton University 

Many of the features of this compendium are familiar to most of the readers of 
this review, but for the benefit of the others I shall enumerate them briefly. It 
consists of a heavy looseleaf binder containing 7 booklets of distinctive colors— 
each saddle Btitchcd and usable separately. It is the last word (to date) in pre¬ 
senting sequential analysis to the statistician who may wish to use it in practice. 
It covers five elementary cases (each in a booklet, the two others being used for 
introduction and appendices) : 

Acceptance or rejection by percent defective (Sec, 2) 

Comparative percent satisfactory (Double dichotomy) (Sec. 3) 

Acceptance or rejection by the adequaoy of the mean (with known variability) 

(Sec. 4) 

Acceptance or rejection by the exact value of the mean (with known variability) 

(Sec 5) 

Acceptance or rejection by the smallness of tho variability (Sec, 6) 

These cases are covered in complete detail, with illustrative examples, tables and 
charts. A copy should be accessible to every teacher of statistics and to every 
statistician in industry or experimental work who can propose new techniques of 
testing. 
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With this general introduction let us go on and explain what the reader will 
not find and what further work in this line the reviewer awaits with keen interest. 
The classical testing procedure was to test a sample of predetermined size and 
then decide to accept or reject. Long ago curtailed sampling and double samp¬ 
ling were developed to cut corners legitimately and reduce inspection costs 
There are two situations, each more frequent in war than in peace, where it is 
clearly desirable to reduce the average number of iteins tested to a minimum: 

(I) Where essentially all lots are accepted and the test is destructive so that the 
items tested are the main loss of production, or 
(II) Where the cost of testing an item is large in comparison with the cost of 
production. 

Subject to a practically unimportant allowance for the finite size of the lots, and 
to an allowance of unknown importance for the quality of lots presented, the 
methods of sequential analysis minimize this average number among all methods 
so far considered. When situation (I) or (II) holds without modifiying complica¬ 
tions, then, the best known method is sequential analysis, the natural descendant 
of double sampling Otherwise, the situation is far from clear, and much judge¬ 
ment is involved in setting up a practically efficient scheme. The reader will 
get no help on this problem of judgement, nor in the problem of setting risks from 
the book under review—he will get every needed help with the mathematical 
problem of setting up a sequential plan to meot chosen risks, including complete 
tables of all necessary functions, including natural logarithms 

There is no reason to suppose that sequential analysis is the last word in testing 
procedures for the general problem of efficient testing, but what should be the 
next step ahead is not a step for the mathematical statistician. What is needed 
now is a careful analysis, by the operational research techniques so useful during 
the war, of a half-dozen industrial testing situations to determine what properties 
of the testing procedure are involved in cost and to what extent. Do we want 
the minimum average sample size, the minimum average square of the sample 
size—or what? With this there should go a corresponding operational study of 
the advantages of different OC curves, including those of what now seems to be 
a peculiar shape. Given these studies, we could put the problem m mathematical 
statistics to the mathematical statistician which he would then solve. But with 
the present lack of operational research groups in industry, it is probable that 
we will proceed in an unnatural way, and that the mathematical statistician will 
take the next step forward. For reasons of mathematical simplicity it is not 
unlikely that the sample plan with the minimum average squared sample size 
will come next. 

The credit for the book is clearly assigned on the inside cover of each pamphlet 
in the following words: “So many members of the Statistical Research Group 
(Columbia) have participated in the preparation of this report, a previous edi¬ 
tion of which was prepared by H. A, Freeman, that its authorship is attributed 
to the group as a whole The responsibility for planning and preparing this 
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edition has been shared by H. A. Freeman, M A. Girshick, and W. Allen Wallis, 
with the cooperation of Kenneth J. Arnold, Milton Friedman, Edward Paulson, 
and others, The theory of sequential analysis is mainly the work of A. Wald.” 

It may be of interest to notice a few minor points for the record. On page 
1,01 it is indicated that 100% inspection is 100% effective— this seems far from 
industrial experience, Another badly needed set of operational studies would be 
on the influence of the sampling plan on inspector’s inspection. On page 2.27, 
the footnote suggests that when a tabular procedure is used instead of 
a graphic one, that more decimal places should be kept—the logic of this is not 
clear. On page 4.14 it is stated that “similarly, if all patches had tested 400 
minutes, the experiment would have terminated at 9.4.,Clearly no such 
experiment can terminate after a fractional number of tests. On page A.09 
it is stated that “Finally it should be mentioned that truncation of any kind 
ought generally to be avoided”. Tins seems to the reviewer to be a rash state¬ 
ment, for when not only average sample size but all other properties entering into 
the practical efficiency of a sampling plan are considered, this decision will almost 
certainly be reversed. The relatively small number of these detailed points is 
an evidence of careful and competent workmanship. 

A footnote to the Appendix (B) on some principles of sequential analysis states: 
“Any mathematician who may stray into this Appendix should be assured that 
the validity of the conclusions in no case depends upon the type of reasoning 
presented here; indeed, even for intuitive or heuristic arguments mathematicians 
may prefer those given in SRG 75”. This warning and caveat seems unduly 
strong—the appendix is recommended to all mathematically minded newcomers 
to sequential analysis. 

The same appendix warns the reader in a few places that the theory set forth 
does not allow for the fact that samples come in units. If the reader tries to 
apply the theory to cases far from normal inspection practice, for example with 
risks of 0.25 and average sample sizes of 12, he will then find out that this does 
occasionally make a difference. In conventional circumstances the approxima¬ 
tion will not bother him. 



NEWS AND NOTICES 

Readers are invited to submit to the Secretary of the Institute news items of interest 

Personal Items 

Mr. Kurt W. Back has accepted a position with the Research Center for Group 
Dynamics, Massachusetts Institute of Technology. 

Mr. Stanley D Canter was discharged from the Army in October and has been 
enrolled as a graduate student in mathematical statistics at Columbia University. 

Mr. William W. Cooper has accepted a position at Carnegie Institute of Tech¬ 
nology, Pittsburgh 

Mr. Robert Dorfman is enrolled as a graduate student in the Department of 
Economics, University of California, Berkeley, and is also serving as a teaching 
assistant in that department. 

Dr Nicholas Eattu, formerly at Michigan State College, has accepted a teach¬ 
ing position at Indiana University, Bloomington. 

Mr. John P Gill is now Chief of the Research and Progress Analysis Division, 
War Assets Administration, Houston Regional Office, Texas. 

Dr. Clausin D. Hadley has accepted a position with the Graduate School of 
Business, Stanford University. 

Mr. Malcolm PI. Henry is now Assistant Statistician in the Statistical Depart¬ 
ment of the Michigan State Department of Social Welfare, Lansmg. 

Dr. Alston S Householder has accepted a position as Principal Physicist with 
the Monsanto Chemical Company, Clinton Laboratories, Oak Ridge, Tennessee. 

Mr. Morton Kramer is now with the Office of International Health Relations, 
U. S. Public Health Service, Washington. 

Mr. E. C. Leone, who was discharged from the military service in the fall, has 
returned to his former posisiton in the Department of Mathematics at Purdue 
University, Lafayette, Indiana. 

Mr. Philip J McCarthy, formerly at Princeton University, is now at Cornell 
University, Ithaca, New York 

Mr Edward C. Molina has been named special lecturer in Mathematics at 
Newark College of Engineering, m addition to Dr. Emil J. Gumbel, previously 
mentioned. 

Mr. Nicholas Pastore has accepted a position in the Department of Mathe¬ 
matics, City College of New York. 

Dr. William S. Robinson is now Assistant Professor of Sociology and Statistics, 
University of California at Los Angeles. 

Dr. Leonard J. Savage, who has a Special Rockefeller Fellowship, is spending 
the academic year at the Institute of Radiobiology and Biophysics, University of 
Chicago 
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Professor Dunham Jackson died at Minneapolis on November 6,1946. From 
1919 until 1946 Mr Jackson was Professor of Mathematics at the University of 
Minnesota, and in 1946 was named Professor Emeritus. 

Professor Charles C. Wagner died suddenly on May 23,1946, at the age of 62. 
He was acting dean of the College of Liberal Arts of Pennsylvania State College 
when he died. 

Those interested in the work of the Mathematical Tables Project, will, upon 
request, be placed on the mailing list for copies of the monthly progress reports, 
issued by the Project. Requests should be addressed to Dr. Arnold N. Lowan, 
150 Nassau St., New York, N. Y. 


Statistical Research Laboratory, University of Michigan 

Several developments in instruction and research in the general field of sta¬ 
tistics are in progress at the University of Michigan. 

At the beginning of the current academic year the new Statistical Research 
Laboratory was opened It is planned that this unit, which is a division of the 
Graduate School, will serve as the center for research employing statistical me¬ 
thods and for research in statistical methodology. Free consultation and advice 
on statistical matters are offered to all members of the University engaged in 
research and the latest types of computing machines arc available for their use at 
no cost to them. Or the Laboratory will undertake, at fees to cover costs, com¬ 
puting and the analysis of data for such individuals or units of the University. 
The Laboratory will have available the services of the University's completely 
equipped Sorting and Tabulating Station and expects to continue to provide a 
center for the most efficient computing service as improved machines are de¬ 
veloped. The technical assistants employed by the Laboratory will be advanced 
Btudents of statistics who will thus have the opportunity to supplement their 
training with experience with actual statistical investigations. Professor 
G. C. Craig as Director and Professor P. S. Dwyer arc in charge of the new labora¬ 
tory, each on a half-time basis. 

The new Laboratory is a research and not a teaching unit and is distinct from 
the large statistical laboratories for the use of students in statistics courses already 
in existence on the campus. With respect to instruction in theoretical statistics, 
the curriculum in that subject in the Mathematics Department has recently been 
revised and extended to include twenty-four semester hours at the undergraduate 
and graduate levels in addition to courses in probability, finite differences, 
graphical methods, and quality control. The somewhat related professional 
program in actuarial mathematics has likewise been strengthened. The teaching 
staff for these two curricula includes Professors II. C. Carver, A. H. Copeland, 
C. C. Craig, P. S. Dwyer, C. H Fischer, and C. J. Nesbitt. 

A number of postwar research programs whose pursuit involves the use of 
probability and statistical methods 'have been established at the University of 
Michigan. Of especial interest is the new Survey Research Center under the 
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leadership of Professors R. Likert and A. A. Campbell who will continue activities 
begun, by their group in Washington m the Department of Agriculture. Re¬ 
search by survey methods in the social sciences for public and private agencies 
and in survey methods themselves will be pursued and in addition a training 
program combining formal courses and apprenticeship in the Center is being 
set up. 


New Members 

The following persons have been elected, to mcmbeislnp m the Institute' 

Albert, George E., Ph.D. (Wisconsin) Head, Mathematics Division, Research Dept., Naval 
Ordnance Plant, Indianapolis, Ind , 1104 N Oakland Ave 

Ament, Richard P-, B.A. (Cornell) Scientific Aid, 212.9 20ih St., N , Arlington, Va 

Bennett, Myra S., (Mrs. C. A.). A B. (Michigan) Office Mgr., Institute of Math Stat , 
Rackham Bldg., Ann Arbor, Mich , P. O. Box #8, Saline. 

Blankmeyer, Edith., A.B (Western College) Stat., Res. Dept., National Broadcasting Co., 
30 Rockefeller Plaza, New York 20, N. Y. , 

Blyth, Colin, Jr., M A. (Queen’s Umv. and Univ. of Toronto) Graduate student, Univ. of 
N. Car , Chapel IIill, N Car., 209 Mangum Dormitory 

Brown, Philip, B.S (Pittsburgh) Stat., R 329 Standard Oil Bldg., 3rd and Constitution 
Aves , Washington, D. C. 

Bruno, O. P., B M E (New' Yoik Univ.) Chief, Methods Section, Ballistic Reseaich Labs., 
Aberdeen Proving Ground, Md 

Carrier, Norman H., M.A. (Cantab) Civil Servant, Mathematical Statistics Section of 
Chief Scientific Advisers Division, Ministry of Works, c/o Westminster Bank, Palmers’ 
Green, N. IS, London, England. 

Chand, Uttam, M.A. (Punjab Univ., India) Graduate student, Umv. of N. Car., Chapel 
Hill, N. Car., 112 Mangum Dormitory. 

Crow, EdwinL.,Ph,D. (Wisconsin) Mathematician, Science Dept., Res., Devcl., and Test 
Organization, USNOTS, Inyokern, Calif 

Dang, Mary., M A. (California) Graduate student, Columbia University, New York 27, 
N. Y , Box 267, Johnson Hall. 

Ens, Catherine C., B.S. (Dayton) Stat Res Ass’t, Graduate School, Ohio State Univer¬ 
sity, Columbus, Ohio, 267 Fifteenth Ave , Columbus 10. 

Fox, William H., Ph D. (Indiana) AsB’t Prof, of Educ and Ass’t Director of Res and 
Field Service, Indiana Umv., Bloomington, Ind , 729' E. Hunter. 

Geisler, Murray A., M.A. (Columbia) Operations Analyst, Headquarters Army Air 
Forces, 222 N. Piedmont Si., Arlington, Va. 

Gershenson, Charles P., B.B.A, (C C.N Y.) Res Asboc., Institute of Psychological Res., 
Box ISO, Teachers College, New York 27, N. Y. 

Gilford, Leon, AB (Brooklyn) Econ. Analyst, Census Bureau, Washington, D. C., 1410 
19th SI., S. E. 

Goudsmit, S., A., Ph.D. (Leyden) Prof, of Physics, Northwestern Univ., Evanston, Ill. 

Halperln, Max, M.S, (Iowa) Graduate student, Univ. of N. Car., Chapel Hill, N. Car , 211 
Ho. Columbia 

Halperin, Sidney L., Ph.D. (Ohio State) Psychologist, Neuropsyclnatric Institute, Univ. of 
Mich, Hospital, Ann Arbor, Mich., 21fil Pittsfield Blvd., Pittsfield Village. 

Herbach, Leon H., A.B. (Brooklyn) Sub. Instr , Dept, of Math , Brooklyn Coll., N. Y., 
1926 64th St., Brooklyn 4. 

Hoefiding, Wassily, Ph D (Berlin) 161 West 88 St., New York 24, N Y 

Huhndorff, Roland F., B.S. (St. Mary’s Univ.) Ass’t to Ass’t Chief Chemist, The Texas 
Co., Res. Lab., Poit Arthur, Texas. 
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James William C., A B (Knox Coll) Director, Stat Div., National Safety Council, 20 
ISr' Wacker Dr., Chicago 6, Ill,, 78SS So Dobson Ave., Chicago 10. 

lev Joseph, PhD (Cornell) Ass’t Civil Service Examiner, N. \ C. Civil Service 
Comm , and Lecturer, Teacheis College, ColumbiaUniv., N. A',, SBSO Forest Parkway, 

Woodlmen 21. , 

Linder, Arthur, Ph D (Bern) Prof of applied math, stat., University of Geneva, Switzer¬ 
land, Avenue de Champel 24 

Lord, Frederic M., M A. (Minnesota) Ass’t Director, Graduate Record Examination, 437 
West 59tli St . New York 19, N Y , IBS W. 63rd St. 

Marshall, Herbert, B.A (Toronto) Dominion Stat , Dominion Bureau of Statistics, 
Ottawa, Canada 

Meacham, Alan D., Supv , Sorting and Tabulating Station, and Lecturer, School of Bus. 
Adm., Univ of Mich , Ann Arbor, Mich., 114 Rackharn Bldg 

Miller, Irving, B.S. (C.C.N Y.) Stat., Bureau of Labor Stat., Washington, D. C .,1900 
Billmore St., N W., Washington 3 

Nanda, D., N., M A (Agra, India) Graduate student, Univ of N. Car , Chapel Hill, N. 
Car , Dept, of Statistics. 

Pines, Sylvia F., M A (Michigan) Insti , Math, and Stat., 43-17 /,8th St., Long Island 
City 4, N. Y. _ 

Quastler, HenryM.D (Vienna) Medical Radiologist, Carle Hospital Clinic, Urbana, Ill., 
612 W Nevada 

Reiersol, Olav, Ph D. (Stockholm) Teaohcr of stat., Umv. of Oslo, Oslo, Norway, Interna¬ 
tional House, BOO Riverside Dr., New York 27, N Y. 

Romanovsky, Vsevolod I., Ph D. (Moscow) Prof, at the Univ and Member of tile. Academy 
of Sciences, Tashkend, U S. S. R 

Rust, Charles H., S.J , M A (St Loins) Graduate student, St. Louis Univ., St. Louis, 
Mo , 221 N. Crank Blvd., St. Louis S. 

Seal, Hilary L., B.Sc, (Umv. Coll, London) Head of Stat. Branch, Room 2, Old Bldg., G., 
Admiralty, Whitehall, London, S. W. 1, England. 

Serbein, Oscar N., Jr., M S. (Iowa) Graduate student, Columbia Univ,, Now York 27, 
N Y., Army Hall, Rm S3SH, 1560 Amsterdam Ave., New York 31. 

Sholl, D., A , B.Sc (London) Stat. in Math Stat Section of Chief Scientific Adviser’s 
Div , Ministry of Works, 81 Lynmoulh Ave., Bash Ilill Park, Enfield, Middlesex, 
England. 

Siegel, Irving H., M.A. (Now York) Chief,Economics Div., Veterans Adm., Washington, 
D. C , 5407 9th St , N. WWashington 11. 

Sitgreaves, Rosedith, M.A, (Geo Washington) Ass't Stat., U. S. Public Iloalth Service 
(on leave), Giaduate student, Columbia Umv., New York 27, N. Y., Johnson Hall, 
411 W 116th St. 

Tama, Joseph, B A (Washington) Pfc. U. S. Army, 5260 TIC; GIIQ AFP AC; APO 600, 
c/o Postmaster, San Francisco, Calif 

Tate, Merle W., Ed.M, (Harvard), M.A, (Montana) Assoc. Prof, of Educ., Hamilton 
Coll , Clinton, N. Y 

Thrall, Robert M„ Ph.D. (Illinois) Ass’t Prof, of Math , Univ. of Mich., Ann Arbor, 
Mich , 958 Spring Si 

Vaughn, Kenneth W., Ph.D (Iowa) Director, Graduate Recoid Examination Office of the 
Carnegie Foundation for the Advancement of Teaching, and, Assoc. Director of Co- 
operative Test Service of Amer, Council on Educ., 437 West B9lhSl, Neio Yorkl9, N. Y, 

Wallace, Clifford A., Sup’t of Quality, Camera Works, Eastman Kodak Co., 333 State St., 
Rochester, NY 

Wilkins, J , Ernest, Jr., Ph D. (Chicago) Mathematician, American Optical Co., S. I. D., 
Box A, Buffalo 15, N, Y. 

-Wilkinson, Roger I., B S E.E. (Iowa State) Member Technical Staff, Bell Telephone LabB., 
463 West St., New York, NY. 



REPORT ON THE BOSTON MEETING OF THE INSTITUTE 


The twenty-fourth meeting of the Institute of Mathematical Statistics was held 
at the Hotel Statler, Boston, Massachusetts, on Saturday, December 28, 1946. 
The meeting was held in conjunction with the One Hundred Thirteenth Annual 
Meeting of the American Association for the Advancement of Science The 
following 45 members of the Institute attended the meeting: 

K. J. Arnold, M. S Bartlett, W D. Baton, C I Bliss, G W Brier, G W, Brown, T. H- 
Brown, B II. Camp, C W. Churchman, W G. Cochran, .1. It Curtiss, D. B. DoLury, P. V. 
Dorweiler, Churchill Eisonhart, Benjamin. Epstein, II A. Freeman, Hilda Gemnger, II II. 
Germond, J. A Greenwood, Boyd Haishbarger, W. A. Hendricks, E. H. C Hildebrandt, 
W G Jacob, H B. Kartz, L, F. Knudsen, Walter Leighton, A J. Lotka, J W Mauehly, 
Margaret Merrell, E. B Mode, Frederick Mosteller, C, M. Mottley, Dons Newman, R. H. 
Noel, II. W. Norton, Otis Pope, C. J Rees, C F Roos, P. J. Rulon, J. W. Tukey, W. M. 
Upholt, F M. Wadley, C. L. Weavei, C. P. Winsor, W. J Youden. 

At the morning session, a joint session with the Biometrics Section of the 
American Statistical Association, the following program was presented with 
Professor E. B Wilson of Harvard University as chairman: 

Topic The Analysis of Variance in Biology 

Papors- The Assumptions Underlying the Analysis of Variance 

Professor Churchill Eiaenhart, University of Wisconsin and The National 
Bureau of Standards 

Same Consequences when the Assumptions are not Satisfied. 

Professor W. G, Cochran, North Carolina State College 
The Wsc of Transformations 

Professor M. S. Bartlett, Cambridge University and the University of 
North Carolina 

Discussion: Professor Boyd Harshbargor, Virginia Polytechnic Institute 
Dr. W. C. Jacob, Long Island Vegetable Research Farm 
Professor C. P Winsor, Johns Hopkins Umveisity 
Dr W J. Youden, Boyce Thompson Institute 

The program for the afternoon session, also a joint session with the Biometrics 
Section, under the chairmanship of Dr. E. J. DeBeer, Wellcome Research 
Laboratories, was as follows: 

Topic: The Analysis of Variance m Biology ( continued) 

Papers: The Analysis of Covariance 

1 Profossoi D. B. DeLury, Virginia Polytechnic Institute 
Discriminant Functions 

Professor George W, Brown, Iowa State College 

Discussion; Professor W. D. Baton, Michigan State College 
.Professor C. I. Bliss, Yale University 
Mr.W. A Hendricks, U. S Department of Agriculture 
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P. S Dwyer, 
Secretary. 



ANNUAL REPORT OF THE PRESIDENT OF THE INSTITUTE FOR 1946 

New Opportunities 

The return to peacetime conditions presents the Institute with new oppor¬ 
tunities for expanding its activities and usefulness. An increased appreciation 
for mathematical statistics has followed the many contributions made by our 
members to the war effort. The numerous societies interested m specific appli¬ 
cations of statistics have come to look to the Institute both for leadership in 
theory and for playing its part in the dissemination of new results. As a result of 
the drastic interruption m the normal training of students during the war, there 
is unusually keen competition for the services of capable statisticians. Those of 
our members who are engaged in teaching are responsible for the execution of a 
vigorous training piogram to meet current and future demands promptly and 
without sacrifice of quality. In short, we are in a position, as never before, to 
advance the development and efficient use of mathematical statistics. The fol¬ 
lowing account of some of our activities during the year will indicate, I believe, 
that the record is creditable. Yet in many instances what has been accomplished 
is only a beginning. 


Meetings 

The Development Committee has repeatedly stressed the desirability of an 
extension in our customary schedule of meetings in order to provide additional 
contacts between mathematical statisticians and the users of statistics. Owing 
to the greater availability of railway and hotel accommodation in 1940, we ob¬ 
tained our first opportunity to put this extension into effect, The regular winter 
meeting with the American Statistical Association and other social science or¬ 
ganizations was resumed at Cleveland in January, while the late summer meeting 
with the mathematicians took place at Cornell m September. In addition, two 
meetings were held with different sections of the American Association for the 
Advancement of Science, at St Louis in March and at Boston in December, 
On both occasions the programs were expository and attracted large audiences. 
Finally, at the invitation of Princeton University, a one-day meeting at Princeton 
in November was devoted to the analysis of variance. While no joint sessions 
were conducted with engineering or industrial societies, several of our members 
took prominent parts m the programs of such societies. 

For the near future, it seems desirable to continue the practice of meeting in 
the winter with the ASA and social science groups and in the summer with the 
mathematical groups. In 1947 these meetings will be at Atlantic City, January 
24-27 and at Yale, September 1-5 respectively. It is not known whether con¬ 
ditions m future years will produce a return to Christmas rather than January 
meetings, for the present the hotel situation swings the balance in favor of 
January. 
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In 1946 the membership of the program committee was enlarged so that it 
would be better equipped to arrange joint meetings with other societies. We 
owe our thanks to the members for their successful efforts in the face of difficulties 
which still attend the planning of a meeting. 

Annals 

Despite the scarcity of manuscripts in the later stages of the war, our editor, 
Professor S S. Wilks, succeeded throughout in maintaining the annual volumes 
of the Annals at their usual size. During 1946, scarcity gave way to plenty. 
The number of papers of good quality submitted m recent months is sufficiently 
great that there will be more than enough, by current estimates, to fill the 1947 
volume. To narrow the scope of the Annals or to reject good papers would be 
undesirable. Accordingly, the Directors have authorized an increase of 100 
pages in the 1947 volume if this is necessary to insure the publication of all ac¬ 
ceptable papers. 

A gratifying testimony to the prominence of the Annals in its field is the marked 
increase in the demand for back numbers. Our Secretary-Treasurer reports 
that sales amounted to $3,235. To meet actual or anticipated orders, eleven 
issues were reprinted during 1946 at a cost of $2,809. 

For most members of the Institute, even those who serve on the Board, work 
on Institute affairs occupies only a minor portion of our time The editor is 
never free from some forthcoming pubheation deadline. Initial perusal of manu¬ 
scripts, selection of referees, editorial decisions, handling of the production phases 
of publication and much miscellaneous correspondence (not all of it pleasant) 
make editorial work a daily preoccupation, year in and year out. An annual 
word of thanks is an inadequate expression of our indebtedness to Professor 
Wilks. 


Membership and Finance 

At the beginning of 1945 there were 606 members. A year later this figure 
had increased to 777 and at the end of 1946 the figure stood at 900. A fifty per¬ 
cent increase in two years is another evidence of the healthy growth of the Insti¬ 
tute. It has been attained to a considerable extent through the hard work of 
our Secretary-Treasurer, P. S. Dwyer and the cooperation of individual members. 

The Secretary-Treasurer also reports a very satisfactory net gain in assets 
of $2,627 during the year. Nevertheless, financial problems may arise in the 
near future. Printing and other costs have risen sharply, and the printing of an 
enlarged Annals will be an additional drain on our resources. Both the Member¬ 
ship and Development Committees have given some thought to the need for 
additional revenue that may face us soon. They have recommended considera¬ 
tion of the possibility of Institutional Memberships, a device that has been found 
satisfactory by some other societies. A continued growth in membership will 
also help greatly to finance expanded activities. 
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Committees 

Inter-society affairs. The report of the 1944 Committee on Development, stress¬ 
ing the need for closer cooperation amongst the various societies interested in 
statistics, provided the stimulus for active efforts in this direction, A meeting 
of representatives of these societies was called early in 1945 at the invitation of 
the American Statistical Association. This meeting suggested that a reconsti¬ 
tution of the ASA might enable it to become the central binding organization. 
Accordingly, a committee of the ASA has worked for a considerable time on a 
revision of the ASA constitution, which it is intended to submit to the votes of 
ASA members early in 1947. The new constitution provides for representation 
from other societies on the Council of the ASA, should these societies decide to 
associate or affiliate with the ASA. 

From our own point of. view, it has seemed wise to delay action on certain 
internal affairs while awaiting the outcome of these developments in the ASA. 
Thus a statement of policy with regard to the formation of chapters of the IMS 
is needed and the problem has been considered both by a special committee in 
1945 and by the Development Committee in 1946. The latter committee recom¬ 
mends that no decision be made pending examination of the provisions for joint 
sponsorship of local and regional chapters in the new ASA constitution Simi¬ 
larly, our own Committee on Revising the Constitution and By-Laws lias de¬ 
ferred a final report until the attitude of our members towards Iho new develop¬ 
ments can be expressed. It is to be hoped that decisions can be taken in 1947. 

Tabulation • The advances made in recent years in the construction of now types 
of computing equipment justified an enlargement of our Committee on Tabula¬ 
tion, which now includes experts both on the building of machines and on the 
calculation and use of tables. The committee plans to keep our members in¬ 
formed of progress in this field. 

Government Service: Dr. W. Edwards Doming served as chairman of a new com¬ 
mittee on Mathematical Statistics and Statisticians in the Government Service. 
Although the federal government employs many mathematical statisticians, 
explicit recognition of the profession is lacking in many instances. As has hap¬ 
pened m other fields, statisticians are sometimes officially classed as economists 
and little provision is made for mathematical statisticians in recruitment policies. 
Moreover, it is probable that a number of branches of the government, at present 
unaware of the functions of a statistician, could employ several with profit. 
The new committee will endeavor to insure that mathematical statistics is recog¬ 
nized and effectively utilized in the federal service. 

Assistance to libraries. Like other professional societies, the Institute has re¬ 
ceived a number of appeals from libraries in war areas whose periodicals were 
looted or destroyed during the war. After careful consideration, the Board 
decided that official action should he limited to the free provision of missing 
copies of the Annals to all former subscribers who intend to renew subscriptions 
for the future. In addition, a committee with Professor J. Neymnn as chairman 
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was appointed to establish a procedure by which gifts of individual members 
(books, reprints, back numbers of the Annals or cash for the purchase of back 
numbers) could be handled. At the suggestion of this co mmi ttee a general 
appeal for the small sum of 50 cents per member was circulated with the Decem¬ 
ber billing. Individual collections are also being made at certain centers. 

Teaching: The Committee on Teaching has not made as much progress as it 
would have liked, owing to the dispersal of its members and the taking up of new 
civihan posts. Members have, however, cooperated with the Committee on 
Applied Mathematical Statistics of the National Research Council, which is 
engaged on a somewhat similar survey. 

Rielz lecture • The first lecturer in the new senes of lectures in honor of the late 
Henry Lewis Rietz will be Professor A. Wald. His topic will be “Sequential 
Estimation and Multi-Decisions”. The lecture will he delivered in connection 
with the Yale meetings, September 1947. 

Representatives: In addition to its committee work, the Institute cooperates, 
through representatives, with the Division of Physical Sciences of the National 
Research Council, the Joint Committee for the Development of Statistical Ap¬ 
plications m Engineering and Manufacturing, the American Association for the 
Advancement of Science, the Inter-Society Committee on Federation and die 
Policy Committee for Mathematics. The last committee, which was appointed 
in 1946, will consider important problems that affect the mathematics profession 
as a whole. 

Nominations' The Committee on Nominations, consisting of Professor P. R. 
Rider (chairman), Professor B. II. Camp and Professor G. M. Cox, has made the 
following nominations for officers in 1947. 

President, W.Fcllci 
Vice-Presidents J. II Curtiss 
M. II. Hansen 

Secretary-Trcasuier P S Dwyer 

While it is perhaps improper to comment on nominations, I should like to 
express my personal appreciation of Professor Dwyer’s action in being willing 
to offer himself for re-nomination as Secretary-Treasurer. The successful opera¬ 
tion of the Institute rests mainly on the Secretary-Treasurer, and the demands 
of the Office arc even more continuous and exacting than those on the editor. 
Professor Dwyer’s splendid work during his first three years of office, carried on 
at considerable sacrifice of his research interests, deserves the best thanks and 
appreciation of every member. 

In conclusion, it is a pleasure to express my sincoiest thanks to all committee 
chairmen and members and to all representatives for their excellent work for the 
good of the Institute, and to all Institute members for their loyal support. 

W. G. Cochran, 
President, 1946. 
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Committees of the Institute 

Committee Personnel 

Development E. G. Olds (chairman), C, I. Bliss, M. A. 

Girshick, F. C. Mosteller, P. S. Olmstead, 
H. Scheffd. 


Membership W. Feller (chairman), C. C. Craig, P. A. 

Horst,, T, Koopmans 

Program J. II. Curtiss (chairman), M. Friedman, B. 

Harshbarger, W. N. Hurwitz, A. M. Mood, 
F. C. Mosteller, J. W. Tukey 


Mathematical Statistics and W. E. Demmg (chairman) 
Statisticians in the Govern¬ 
ment Service 


Revising the 
and By-Laws 


Constitution- M. H. Hansen (chairman), C. I. Bliss, A. T. 
Craig, J. H. Curtiss, W. Shewhart 


Tabulation C. Eisenhart (chairman), P. S. Dwyer, H. 

Goldstine, A. N. Lowan, II. W. Norton, G. R. 
Stibitz 


Teaching H. Hotelling (chairman), W. Bartky, W. E. 

Deming, M. Friedman 

Nominations P, R. Rider (chairman), B. H. Camp, G. M. 

Cox 


Finance 


P. S. Dwyer (chairman), L. A. Knowler, C. F, 
Roos, F. F. Stephan 


Subscription to Purchase An- J, Neyman (chairman), W. Feller, P. L. Hsu 
nals for Countries Devas¬ 
tated by War 

Society Representatives 

Inter-Society Committee on J. H. Curtiss, P. S. Olmstead 
Federation 


Policy Committee for Mathe- W. Feller 
matics 
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Society Representatives 

Joint Committee for the De- F. C. Mosteller, S- S. Wilks 
velopment of Statistical 
Applications in Engineer¬ 
ing and Manufacturing 

American Association for the G. W. Sneclecor 
Advancement of Science 

Division of Physical Sciences, W. Bartky 
NRC 



REPORT OF THE SECRETARY-TREASURER OF 
THE INSTITUTE FOR 1946 

The Institute of Mathematical Statistics held five meetings (luring 1946, at 
Cleveland on January 24-27, at St. Louis on March 30, at Ithaca on August 
22-23, at Princteon on November 1, and at Boston on December 28. 

The large number of meetings has necessitated frequent mailings to the 
membership. Memoranda to members, with appropriate enclosures, were sent 
out in January, March, June, July, October, and November. 

The Secretary-Treasurer wishes to acknowledge the cooperation of the mem¬ 
bers of the Institute in paying bills promptly, m considerable activity leading to 
an increase in membership, and in general looking after the interests of the In¬ 
stitute. 

At the beginning of 1946 the Institute had 777 members. During the year 
180 new membeis joined the Institute, an increase of 23% However, during 
1946 the Institute lost 57 members. Of these, 15 resigned, 37 were dropped for 
non-payment of dues, and 5 arc deceased. Some of the 37 dropped we have 
been unable to contact, and it is very probable that, in some cases, membership 
will be resumed in the future. The net increase in members during the year waB 
123, or about 16%, making a total of 900 members. 

The following members died during the year: 

Professor 0.1?. Banos 
Professor S. A. Cudmoro 
Professor Dunham Jackson 
Dr. Walter F Schilling 
Professor C. C. Wagner 

The office of the Secretary-Treasurer sent a reprint of an A unals article and 
information about the Institute to 1800 persons interested in Quality Control. 
At least 28 of the new members became members as a result of this drive. As a 
continuation of a campaign started in 1945, the Institute also sent literature 
about the Annals to several hundred libraries and laboratories 

The Secretary-Treasurer wishes to acknowledge the continued assistance of 
Professor Lloyd Knowler in caring for the back issues of the Annals which are 
stored at Iowa City. 

A few comments about the financial statement which appears below are in 
order In addition to the increase in membership, mentioned above, the chief 
rise in income resulted from the unprecedented sales in back issues which 
amounted to $3,234.88, an increase over the preceding year (the previous high) 
of 86%. These heavy sales, however, depleted the supplies of many of our early 
issues, so that we were forced to reprint eleven of those issues and also the cumula¬ 
tive index during the year. This cost $2,809.00 (for 500 copies of each) and in¬ 
dicates that a much larger portion of our assets is in inventoiy, as shown in Ex¬ 
hibit D. 
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Following the instructions of the Finance Committee, Professor H C. Carver 
was paid for his share of all issues m which he and the Institute had joint owner¬ 
ship. 

Nine members have paid life memberships during the year, increasing the 
total of life membership funds by $812,50. 

The net gain in assets of $2,627.23 is very satisfactory even though this gain 
is evident in increased inventory and not in a better cash position. 


FINANCIAL STATEMENT 
December 31, 1045, to December 31, 1046 
A Receipts 

Balance on Hand, December 31, 1945. ... ... $7,548 22 

Does ... .. .. .. . 4,638 40 

Life Membership Payments. . . . . . , S12 50 

Subscriptions. , . 2,057.54 

Sale of Back Numbers. .... .... ... . 3,234.88 

Income from Investments . , ,, . . . 150 00 

Miscellaneous . . ..... 121.29 


Total . . . $18,562.83 

B. Expenditures 

Annals— Current 

Office of Editor . . $125 00 

Waverly Pi ess. . 4,566 27 


$4,601.27 

Annals—Back Numbers 

Purchase from II. C. Carver. . .. . 644.50 

Reprinted 500 copies. .. . . 2,809.00 

Vol. I $1, IT #2, II #3, III #3, IV m, VII #3, VII jff2, VIII 
Kfl, 2, 3, 4, Cumulative Index 

Iowa City Office. . . 41.46 

Binding. . . . 68 00 


3,562.96 

Office of President . . . . . 25.62 

Mathematical Reviews. . . . 100 00 

Office of the Secretary-Treasurer 

Printing, Mimeographing, programs, etc (including stamped 

envelopes). . $967.14 

Printing 1800 copies of Wakl-Wolfowitz article. 140.00 

Postage and supplies. '. 375.00 

Clerical bclp.. , . 1,420 25 


2,902.39 

Miscellaneous. 30.04 

Balance on Hand, December 31, 1940 (Cash and Bonds). . 7,241.55 


$18,562.83 
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C. Summary op Receipts and Expenditures 

Balance on Hand,* December 31, 1945 . $7,548.22 

Receipts during 1946 , . . . • .11,014.61 

Expenditures during 1946. . .... . . 11,321 28 

Balance on Hand,* December 31, 1940 . ... .. 7,241 55 


D. Comparison op Assets on December 31, 1045 and December 31, 1940 


US Government G Bonds. 

Life Membeislnp Funds... , . 

Additional Bank Deposits. 

Current Accounts Receivable . . . 

Estimated Value (Cost) of back issues of Annals . 


im ie-jfi 
$0,000.00 $5,000.00 
I 888.00 j 1888.00 Bonds 
\ 327.00 \ 139.50 Bunk Dep. 
333.22 214.05 

255 35 452.62 

4,497 05 7,234 58** 


Total. .$12,301.52 14,028.75 

Net Gain 1946 . 2,027.23 

E Liabilities op Institute op Mathematical Statistics ab of December 31, 1946 

All bills which have been presented have been paid and there are no outstanding accounts 
against the Institute. The $2027 50 in Life Membership payments require the Institute to 
provide, the privileges of membership for life for the 20 members who have made payments. 
Also, $2686.71 should be credited to 1947 dues and subscriptions, 

Paul S. Dwyer 
Secretary-Treasurer. 

December 31, 194G 


* In form of bank doposit and government bonds. 

** Value of Annals calculated at 67 cents par copy, and based on physical inventory} 





ANNUAL REPORT OF THE EDITOR FOR 1946 

During 1946 there was a considerable increase in the number of manuscripts 
submitted to the Editorial Committee of the Annals. A total of 49 papers in¬ 
cluding 18 short notes were published in the 1946 volume of the Annals. The 
publication of these papers together with various official reports of the Institute 
and the Directory of the Institute required a total of 555 pages. Plans are al¬ 
ready under way to expand the 1947 volume of the Annals to 600 pages. 

During recent years there has been a very noticeable broadening of interest 
in the field of probability and statistical theory on the part of readers and con¬ 
tributors to the Annals Contributors to the 1946 volume came from university 
departments of astronomy, biology, mathematics, sociology and statistics, from 
Army, Navy and other government groups; and from industrial laboratories and 
quality control departments. More recently, contributions have been received 
from physicists, chemists and other groups. More contributions are now being 
received from overseas than in previous years. Every effort is being made to 
keep the Annals balanced with respect to these various directions of interest in 
probability and statistical problems. It is believed that one of the most effec¬ 
tive things which could be done for the readers of the Annals is to publish ex¬ 
pository articles from time to time on new fields of development in probability 
and statistical theory. Invitations have been accepted by several individuals to 
prepare such articles. 

Dr. Thornton C. Fry has asked to be relieved from the Editorial Com¬ 
mittee, as of Jnnumy 1, 1947. The Editor wishes to take this opportunity to 
express his gratitude for the service which Dr. Fry has rendered in connection with 
the editorial work on the Annals during the past nine years. 

On behalf of the Editorial Committee for the Annals, the Editor wishes to 
acknowledge with thanks the refereeing assistance which has been provided by 
the following persons during 1946: R, L. Anderson, T. W. Anderson, David 
Blackwell, Z. W. Birnbaum, K. L. Chung, W. J. Dixon, J. L. Doob, M. A. 
Girshick, T. E. Harris, L Iienkin, hi Kac, Irving Kaplansky, Bradford F. Kim¬ 
ball, T. Koopmans, II. Levene, H, B. Mann, P J. McCarthy, F. C. Mosteller, H. 
E. Robbins, D. F. Yotaw, J. E. Walsh and C. P. Winsor The Editor is also 
indebted to the following individuals at Princeton University for preparation of 
manuscripts for the printer, and other editorial assistance: Mrs. Gladys B.Huling, 
Mrs. Eleanor C. Schocnly and J. E. Walsh. 

S. S. Wilics 
Editor. 

December 31, 1947 
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CONSTITUTION 

OF THE 

INSTITUTE OF MATHEMATICAL STATISTICS 

ARTICLE I 
Name and Purpose 

1, This organization shall be known as the Institute of Mathematical Statistics, 

2. Its object shall be to promote the interests of mathematical statistics, 

ARTICLE II 

Membership 

1. The membership of the Institute shall consist of Members, Fellows, Honorary 
Members, and Sustaining Members. 

2. Voting members of the Institute shall be (a) the Fellows, and (b) all others. Junior 
members excepted, who have been members for twenty-three months prior to the date 
of voting. 

3. No person shall be a Junior Member of the Institute for more than a limited term as 
determined by the Committee on Membership and approved by the Board of Directors. 

ARTICLE III 

Officers, Board of Directors, and Committee on Membership 

1 The Officers of the Institute shall be a President, two Vice-Presidents, and a Secre¬ 
tary-Treasurer. The terms of office of the President and Vice-Presidents shall be one 
year and that of the Secretary-Treasurer three years. Elections slmll be by majority 
ballots at Annual Meetings of the Institute. Voting may be in person or by mail, 

(a) Exception, The first group of Officers shall bo elected by a majority vote of the 
individuals present at the organization meeting, and shall serve until December 31,1936, 

2 The Board of Directors of the Institute shah consist of the Officers, the two previous 
Presidents, and the Editor of the Official Journal of the Institute. 

3. The Institute shall have a Committee on Membership composed of a Chairman and 
three Fellows At their first meeting subsequent to the adoption of this Constitution, the 
Board of Directors shall elect three members as Follows to serve as the Committee on 
Membership, one member of the Committee for a term of one year, another for a term of 
two years, and another for a term of three yearn. Thereafter the Board of Directors shall 
elect from among the Fellows one member annually at their first meeting after their elec¬ 
tion for a term of three years. The president shall designate one of the Vice-Presidents as 
Chairman of this Committee 


ARTICLE IV 
Meetings 

1. A meeting for the presentation and discussion of papers, for the election of Officers, 
and for the transaction of other business of the Institute slmll be held annually at Buch 
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time as the Board of Directors may designate. Additional meetings may be called from 
time to time by the Board of Directors and shall be called at any time by the President 
upon written request from ten Fellows. Notice of the time and place of meeting shall be 
given to the membership by the Secretary-Treasurer at least thirty days prior to the 
date set for the meeting, All meetings except executive sessions shall be open to the 
public. Only papers accepted by a Program Committee appointed by the President may 
be presented to the Institute. 

2. The Board of Directors shall hold a meeting immediately after their election and 
again immediately before the expiration of their term. Other meetings of the Board 
may be held from time to time at the call of the President or any two members of the 
Board. Notice of each meeting of the Board, other than the two regular meetings, 
together with a statement of the business to be brought before the meeting, must be 
given to the members of the Board by the Secretary-Treasurer at least five days prior to 
the date set therefor. Should other business be passed upon, any member of the Board 
shall have the right to reopen the question at the next meeting. 

3. Meetings of the Committee on Membership may be held from time to time at the call 
of the Chairman or any member of the Committee provided notice of such call and the 
purpose of the meeting is given to the members of the Committee by the Secretary- 
Treasurer at least five days before the date set therefor. Should other business be passed 
upon, any member of the Committee shall have the right to reopen the question at the 
next meeting. Committee business may also be transacted by correspondence if that 
seems preferable 

4. At a regularly convened meeting of the Board of Directors, four members shall 
constitute a quorum. At a regularly convened meeting of the Committee on Member¬ 
ship, two members shall constitute a quorum. 

ARTICLE V 

Publications 

1. The Annals of Mathematical Statistics shall be the Official Journal for the Institute. 
The Editor of the Annals of Mathematical Statistics shall be a Fellow appointed by the 
Board of Directors of the Institute. The term of office of the Editor may be terminated 
at the discretion of the Board of Directors, 

2. Other publications may bo originated by the Board of Directors as occasion arises. 

ARTICLE VI 
Expulsion on Suspension 

1 Except for non-payment of dues, no one shall be expelled or suspended except by 
aotion of the Board of Directors with not more than one negative vote. 

ARTICLE VII 
Amendments 

1. This constitution may bo amended by an affirmative two-thirds vote at any regu¬ 
larly convened meeting of the Institute provided notice of such proposed amendment 
shall have been sent to each voting member by the Secretary-Treasurer at least thirty 
days before the date of the meeting at which the proposal is to be acted upon. Voting 
may be in person or by mail. 
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ARTICLE I 

Duties op the Officers, the Editor, Boaud of Dieectohs, and 
Committee on Membership 

1. The President, or m his absence, one of the Vice-Presidents, or in tiio absence of the 
President and both Vice-Presidents, a Fellow selected by vote of the Follows present, 
shall preside at the meetings of the Institute and of the Board of Directors. At meetings 
of the Institute, the presiding officer shall vote only in the case of a tie, but at meetings 
of the Board of Directors he may vote in all cases. At least three months before the date 
of the annual meeting, the President shall appoint a Nominating Committee of three 
members. It shall be the duty of the Nominating Committee to make nominations for 
Officers to be elected at the annual meeting and the Secretary-Treasurer shall notify all 
voting members at least thirty days before the annual meeting. Additional nomina¬ 
tions may be submitted in writing, if signed by at least ten Fellows of the Institute, up to 
the time of the meeting. 

2. The Secretary-Treasurer shall keep a full and accurate record of the proceedings 
at the meetings of the Institute and of the Board of Directors, send ouL calls for said 
meetings and, with the approval of the President and the Board, carry on the corre¬ 
spondence of the Institute. Subject to the direction of the Board, lie shall have charge 
of the archives and other tangible and intangible property of the Institute and once a year 
he shall publish in the Annals of Mathematical Statistics a classified list of all Members and 
Fellows of the Institute. He shall send out calls for annual dues and ackno wlodgo receipt 
of same, pay all bills approved by the President for expendituies authorised by the Board 
or the Institute; keep a detailed account of all receipts and expenditures, prepare a finan¬ 
cial statement at the end of each year and present an abstract of the same at the annual 
meeting of the Institute after it has been audited by a Member or Fellow of the Institute 
appointed by the President as Auditor. The Auditor Bhall report to the President, 

3. Subject to the direction of the Board, the Editor Bhall bo charged with the responsi¬ 
bility for all editorial matters concerning the editing of the Annals of Mathematical Sta¬ 
tistics. Heshall, with the advice and consent of the Board, appoint an Editorial Commit¬ 
tee of not less than twelve members to co-operate with him; four for a period of five years, 
four for a period of three years, and the remaining members for a period of two years, ap¬ 
pointments to be made annually as needed. All appointments to the Editorial Com¬ 
mittee shall terminate with the appointment of a new Editor. The Editor Bliall serve as 
editorial adviser in the publication of all scientific monographs and pamphlets authorized 
by the Board. 

4. The Board of Directors Bhall have charge of the funds and of the affairs of the 
Institute, with the exception of those affairs specifically assigned to the President or to 
the Committee on Membership. The Board shall have authority to fill all vacancies 
ad interim, occurring among the Officers, Board of Directors, or in any of the Committees. 
The Board may appoint such other committees as may be required from time to time 
to carry on the affairs of the Institute. The power of election to the different grades of 
Membership, except the grades of Member and Junior Member, shall reside in the Board, 

5. The Committee on Membership shall prepare and make available through the 
Secretary-Treasurer an announcement indicating the qualifications requisite for the 
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different grades of membership. The Committee shall review these qualifications period¬ 
ically and shall make such changes in these qualifications and make such recommendations 
with reference to the numbei of grades of membership as it deems advisable. The power 
to elect worthy applicants to the grades of Member and Junior Member shall reside in the 
Committee, which may delegate this power to the Secretaiy-Treasurer, subject to such 
reservations as the Committee considers appropriate The Comnnttee shall make recom¬ 
mendations to the Board of Directors with reference to placing members in other giades 
of membership. The Committee shall give its attention to the question of increasing the 
number of applicants for membership and shall advise the Secretary-Treasurer on plans 
for that purpose. 

ARTICLE II 
Does 

- 1. Members shall pay five dollars at the time of admission to membership and shall 
receive the full current volume of the Official Journal. Thereafter, Members shall pay 
five dollars annual dues. The annual dues of Junior Members shall be two dollars and 
fifty cents 

The annual dues of Fellows shall be five dollars. The annual dues of Sustaining 
Members shall be fifty dollars. Honorary Members shall be exempt from all dues. 

(a) Exception. In the case that two Members of the Intitute are husband and wife 
and they elect to receive between them only one copy of the Official Journal, the annual 
dues of each shall be thiec dollars and seventy-five cents 

(b) Exception. Any Member or Fellow may make a single payment which will be 
accepted by the Institute in place of all succeeding yearly dues and which will not other¬ 
wise alter Ins status as a Member or Fellow. The amount of this payment will depend 
upon the ago of this Member or Fellow and will be based upon a suitable table and rate of 
interest, to be specified by the Board of Directors. 

(c) Exception. Any Member or Junior Member of the Institute serving, except as a 
commissioned officer, in the Armed Forces of the United States or of one of its allies, may 
upon notification to the Secretary-Treasurer be excused from the payment of dues until the 
January first following his discharge from the Service He shall have all privileges of 
membership except that he shall not receive the Official Journal. However during the 
first year of his resumed regular membership he may have the right to purchase, at $2.50 
per volume, one copy of each volume of the Official Journal published during the period 
of his service membership. 

2 Annual dues shall be payable on the first day of January of each year 

3, The annual dues of a Fellow, Member, or Junior Member include a subscription to 
the Official Journal. The annual dues of a Sustaining Member include two subscriptions 
to the Official Journal. 

4. It shall be the duty of the Secretary-Treasurer to notify by mail anyone whose dues 
may bo six months in arrears, and to accompany such notice by a copy of this Article. 
If such person fail to pay such dues within three months from the date of mailing such 
notice, the Secretary-Treasurer shall report the delinquent one to the Board of Directors, 
by whom the person’s name may be stricken from the rolls and all privileges of member¬ 
ship withdrawn. Such person may, however, be re-mstated by the Board of Directors 
upon payment of the arrears of dues. 
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ARTICLE III 
Salaries 

1, The Institute shall not pay a salary to any Officer, Director, or member of any 
committee. 


ARTICLE IV 
Amendments 

1, These By-Laws may be amended in the same manner as the Constitution or by a 
majority vote at any regularly convened meeting of the Institute, if the proposed amend¬ 
ment has been previously approved by the Board of Directors. 
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PROBLEMS IN PROBABILITY THEORY 

By Harald Cramer 
University of Stockholm 

1* Introduction. The following survey of problems m probability theory 
has been written for the occasion of the Princeton Bicentennial Conference on 
“The Problems of Mathematics,” Dec, 17-19, 1946. It is strictly confined to 
the purely mathematical aspects of the subject. Thus all questions concerned 
with the philosophical foundations of mathematical probability, or with its 
ever increasing fields of application, will be entirely left out 

No attempt to completeness has been made, and the choice of the problems 
considered is, of course, highly subjective. It is also necessary to point out 
explicitly that the literature of the war years has only recently—and still far 
from completely—been available m Sweden Owing to this fact, it is almost 
unavoidable that this paper will be found incomplete in many respects. 

I FUNDAMENTAL NOTIONS 

2. Probability distributions. From a purely mathematical point of view, 
probability theory may be regarded as the theory of certain classes of additive 
set functions, defined on spaces of more or less general types. The basic struc¬ 
ture of the theory has been set out in a clear and concise way in the well-known 
treatise by Kolmogoroff [53]. We shall begin by recalling some of the main 
definitions Note that the word additive, when used in connection with sets 
or set functions, will always refer to a finite or enumerable sequence of sets. 

1 Let oi denote a variable point in an entirely arbitrary space £2, and consider 
an additive class G of sets in £2, such that the whole space £2 itself is a member of 
C. Further, let P(S) be an additive set function, defined for all sets & belonging 
to the class C, and suppose that 

P(S) ^ 0 for all S in C, 

P(Q) = 1. 

We shall then say that P(S) is a probability measure, which defines a probability 
distribution m £2 For any set S m (7, the quantity P{S) is called the probability 
of the event expressed by the lelation u CS, be. the event that the variable 
point oi takes a value belonging to 5. Accordingly we write 

P(S) = P(oi C S) 

Suppose now that w' = ff(oi ) is a function of the variable point oi, defined 
throughout the space £2, the values oi' being points of another arbitrary space 
£2' Let S 1 be a set in £2' and denote by iS the set of all points w such that «' = 
g(oi) belongs to S'. Whenever S belongs to C, we define a set function P'(S') 
by writing 

P'(S') = P(S). 
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It is then easy to see that P'(S') is defined for all S' belonging to a certain 
additive class G' in the new space SJ', and that P'(S') is a probability measure 
in such that P'(S') signifies the probability of the event u’ C S' (which is 
equivalent tou CS), We shall say that P'(S') is attached to the probability 
distribution m fi' which is induced by the given distribution in Q and the function 
o' = g(u). 

3. Random variables. Consider in particular the case when u' is a real 
number £, such that £ = g(u) is a real-valued C-measurable function of the 
argument a. Then C‘ includes the class B x of all Borel sets S' of the space fi' = 
Ri of all real numbers, and we shall call £ a one-dimensional real random variable. 
The probability of the event £ C S' is uniquely defined for any Borel set S' of 
Bi, as soon as the function 

F(x) = P(£ £ *) 

is known for all real x, Fix) is called the distribution function ( d.f .) of the 
random variable £. If the function £ = g(co) is integrable over Q with respect 
to the measure P(S ), we write 

= f ff(«) dP = f xdF(x), 

and denote this expression as the expectation or mean value of the random vari¬ 
able £ Any real-valued B-measurable function r, = ft(£) is also a random 
variable with the probability distribution induced by the original co-distribution 
and the function r, = A(ff(«))* If v is mtcgrable over 0 with respect to P, its 
mean value may be written in the form 

E v = EhiH) = [ h{g(a>)) dP = f h(x) dF(x). 

Jo J-m 

More generally, if oi' = (£i, • , £„) is a point in an n-dimensional Euclidean 
space B n , while C" includes the class B„ of all Borel sets of R n , we are con¬ 
cerned with an n-dimensional real random variable. The distribution of this 
variable, which is also called the joint distribution of the n one-dimensional 
variables £i, ■••,£„, is uniquely defined, as soon as the joint d.f. 

F(x i ,■•■,*») = P(£i ^ au ,■•■,£* g x n ) 
is known for all real x x , • • • , x % . 

The variables £i, ■ • , £ n are said to be independent, if F{x i, ■ ■ • , r„) = Fi{x x ) 

■ • F n (x n ), where F r (x,) is the d.f. of the variable £„. 

The extension to complex random variables is obvious. Suppose e.g. that 
£ = <jf(co) and r, = h( co) are two ono-dimensional real variables, and consider 
the complex variable £ + ir, = g(w) + ih{ u). By definition, we identify the 
distribution of this variable with that of the two-dimensional real variable 
(£, >?), and we put 


P(£ + ii,) = P£ + iEi,. 
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Joint distributions of several complex variables are introduced m a correspond¬ 
ing way 

4. Characteristic functions. If £ is a one-dimensional i cal random variable, ** 
the mean value 

<p(z) = Ee* = ^ e lzx dF(x) 

J—CO 

exists for all real z, and we have 

I ¥>00 | = 1, <p( 0) = 1. 

i p(z) is called the characteristic function ( c.f .) of the distribution corresponding 
to the variable £, The reciprocal formula (Ldvy) 

n Z —t SX —121/ 

F(x) - F(y) = — —. lim / ---— <p{z) dz, 

/7T£ Z-+ so J—Z Z 

which holds for any continuity points x and y of F, shows that there is a one- 
one correspondence between the d.f, F(x) and the c.f. p(z). As we shall see 
below, the c f. provides a powerful analytical tool for operations with prob¬ 
ability distributions. 

When a complex-valued function <p(z) of the real variable z is given, it is 
often important to be able to decide whether p(«) is or is not the c f. of some 
distribution. If we assume a priori that <p(0) = 1, each of the following condi¬ 
tions is necessary and sufficient for <p(z) to be a c f. 

A. <p(z) should be bounded and continuous for all z, and such that the integral 

f A f* <p(z - u)«"'~" ) dz du 
J o Jq 

is real and non-negative for all leal x and all A > 0 (Cramdr [11], in simplifica¬ 
tion of an earlier result due to Bochner, [4]). 

B. There should exist a sequence of functions ^i(s), ^j(z), ■ • such that 

r°° _ 

ip{z) = lim I \f/ n {x + z)\p n {x) dx 

n —> co v— oo 

holds uniformly in every finite z-mterval (Khmtchine, [45]). 

These general theorems are not always easy to apply in practice. Among 
less general results which are more easily applicable, we mention the almost 
trivial fact that a function tp(z) which near z = 0 is of the form tp(z) = 1 + o(z’) 
cannot be a c.f. unless tp(z ) = 1 for all z, and the two following theorems: 

1) An integral function <p{z) of order y < 1 can never be a c f. (L<5vy, [64]), and 

2) an integral function <p(z) of finite order y > 2 cannot be a c.f unless the 
convergence exponent of its zeros is equal to y (Marcirikiewicz, [72]). The 
latter result shows e g. that no function of the form e aiz \ where g(z) is a poly¬ 
nomial of degree > 2, can be a c.f. 

It would be highly desirable to obtain further results m this direction. 


* 
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The c.f of the joint distribution of n real random variables &, - • , £„ is the 
function ^(zi, ■ ■ , z«) defined by the relation 

*(ft, •- ,z„) = Ee‘^ + - + ’ M 

Most of the above results for c.f m one variable can be directly generalized 
to the multi-variable case. 

5. Random sequences and random functions. Let t be a variable point in 
an arbitrary space T, and consider the space £2, where each point oj is a real¬ 
valued function to = x(t) of the variable argument i Let h, • • ,'t n be any 
fini te set of distinct points t The set of all functions o> = ar(f) satisfying the 
inequalities 

dj <£ ^(tf) =2 h; , {j 1, * ‘ }ll)) 

will be called an interval in the space £2. The Borel sets in £2 will be defined as 
the smallest additive class B of sets in £2 containing all intervals 

Suppose now that, for any choice of n and the t ,, the variables x(£i), ■ ■ ■ , x(t n ) 
are random variables having a known n-dimensional joint distribution. If the 
family of all distributions corresponding in this way to finite sequences k , 

, t n satisfies certain obvious consistency conditions, a fundamental theorem 
due to Kolmogoroff asserts that this family determines a unique probability 
distribution m the space £2 of all functions x(t) The corresponding probability 

P(S) = P(x(t) a S) 

is uniquely defined for all Borel sets S of £2. 

Consider in particular the case where T is the set of non-negative integers 
t = 0, 1, 2, . The space £2 then is the space of all sequences (x 0 , %i, • • ) 

of real numbers As soon as the joint distribution of any finite number of 
variables x ri , ■ ■ , x, n is defined, and these distributions are mutually con¬ 

sistent, it then follows that there is a unique probability distribution of the 
random sequence (s 0 , xi, •), the corresponding probability being defined 

for every Borel set of the space £2 of sequences Similarly we may consider the 
doubly infinite sequence (• • •, , xo, > ■ )• 

Consider further the more general case when T is any set of real numbers 
Then £2 is the space of all real-valued functions w = x(i) defined on the set T, 
and as before the knowledge of tho distributions for all finite sets of variables 
x(£i), • ■ , x(Q permits us to determine a probability distribution in the space 
£2 of random functions a:(/,), the probability P(S ) = P(x(t) CZ S) being always 
defined for all Borel sets fi in £2. 

The generalization of the above considerations to complex-valued random 
sequences and functions is immediate, 

6. Various modes of convergence. Consider a sequence Fi{x), Fi{x), ., 
of cl f:s. and lei the corresponding c.f :s be , ^(2), • • • . In order that F*(a;) 
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converge to a d.f. F(x), m every continuity point of the latter, it is necessary 
and sufficient 1 that <p n (0 converge ftir evefy real i to a limit ip(i) which is con¬ 
tinuous at l = 0 Then <p(t) is the e f corresponding to the d f F(x). 

Further, let x and Xi, rr 2 , he complex-valued random variables, such 
that the random sequence (r, it , x 2 , • •) has a well defined distribution. We 

shall be concerned with various modes of convergence of x n to x 

A When P{ | r„ — x | > e) —■» 0 ns m —* «>, for any t > 0, we shall say that 
x„ converges lo x in probability 

B. When E | x n — x | 7 —■> 0, as a —» co, where 7 > 0 is fixed, we shall say that 
x n converges to x in the mean of order 7 . Unless otherwise stated we shall m 
the sequel always consider the case 7 = 2 , and in this case we shall use the 
notation 

l.i m. x n = x. 

n—*00 

C. When P(lim x n = x) = 1, we shall say that .r„ converges with piobability 

n— 

one, or converges almost ceUamly to x 

With lespect to the last definition, we may remark that the set defined by 
the relation lim x n = x is always a Borcl set in the space of our random sequence, 
so that the probability of this relation is veil defined In fact, this probability 
is given by the expression 

lim lim lim P ( | x v — ^ 

m _»co »1 ~*oo *oo \ 

where the limit process applies to a probability attached to a Borel set in a finite 
number of dimensions. The case of almost certain convergence is precisely 
the case when this expression takes the value 1 

Convergence 111 the mean of any positive order, as well as almost certain 
convergence, both imply convergence m probability, which may be written 
symbolically B —> A and C —> A Between B and C, there is no simple relation 
of this land Further, A and B both imply almost certain convergence for any 
partial sequence .r Ilt , x„ t , • such that the subscripts Ul increase sufficiently 

rapidly with Ic. 

II. PROBLEMS CONNECTED WITH THE ADDITION OF 
INDEPENDENT VARIABLES 

7. During the early development of the theory of piobability, the majority 
of problems considered were connected with gambling. The gain of a player 
in a certain game may be regarded as a random variable, and his total gam in a 

1 As I have already stated in a paper publishod in 1938, there is an erior in the state¬ 
ment of this theorem given in my Cambridge Tract [9] Random Vat tables and Probability 
Distributions . For the truth of the theoiem, it is essential that <p n (t) should bo supposed 
to conveige to <p(t) for every real l. However, in the particular case when the limit <p{i) 
is analytic and regular in the vicinity of i = 0, it can be proved that it is sufficient to assume 
convergence in some interval | t | < a. 


< - for v = n, n + 1 , 
m 


, n + 


p) 
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sequence of repetitions of the game is the sum. of a number of independent 
variables, each of which repiesents the gain in a single performance of the game. 
Accordingly a great amount of work was devoted to the study of the probability 
distributions of such sums. A little later, problems of a similar type appeared 
in connection with the theoiy of errors of observation, when the total error was 
considered as the sum of a certain number of paitial errors due to mutually 
independent causes At first only particular cases were considered, but gradu¬ 
ally general types of problems began to arise, and in the classical work of Laplace 
seveial results are given concerning the general problem to study the distribution 
of a sum 

Zn — Xi ~b ■ ■ * d - 

of independent variables, when the distributions of the x, are given. This 
problem may be regarded as the very starting point of a large number of those 
investigations by which the modern Theory of Probability was created The 
efforts to prove certain statements of Laplace, and to extend his results further 
m various directions, have largely contributed to the introduction of rigorous 
foundations of the subject, and to the development of the analytical methods. 
At the same time, more general types of problems have developed from the 
original problem, and the number and importance of practical applications 
have been steadily increasing. 

8. Composition of distributions. Let Xi and :c 2 be two independent variables, 
with the d f.’s F% and F 2 , and the c f.’s <pi and p 2 , and let the sum .% + x 2 have 
the d f. F and the c.f. <p Then 

F(x) = f Fi(x — y) dF 2 {y) = [ Fa(x — y) dF\(y). 

J—OO J—eC 

We shallsay that F is the composition of F x and F 2 , and write this as a symbolical 
multiplication: 

F = Fi * F 2 — Fv * Fi 

To this .symbolical multiplication of the d.f:s corresponds a real multiplication 
of the c.f.’s: 

viz) = <pi{z)<Piiz). 

The operation of composition is both commutative and associative, so that 
any symbolical product F ~ Fi* P 2 • • * F n is uniquely defined and independent 
of the order of the components. When at least one of the components is con¬ 
tinuous (absolutely continuous), the same holds for the composite, and in 
many cases it is true that the composite is at least as regular as the most regular 
of the components (Ldvy, [58], [63], etc.). However, this general statement 
does not hold generally, as is shown by an interesting example due to Raikov, 
[77], where Fi and F 2 are integral analytic functions, while the composite F = 
Fi+F t is not regular at the origin 

It seems to be an important unsolved problem to find convenient restrictions 
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ensuring the validity of the above statements of the “smoothing effect” of 
the operation of composition. 

When F = Fi * Fa, we may say that F is “divisible” by each component F r 
and Fa , and it seems natural to try to develop a theory of symbolical factoriza¬ 
tion for d.f ,’s. In this connection, it is important to note that symbolical divi¬ 
sion is not unique. In fact, Khintcliine has shown by an example that it is 
possible to find the d.f ,'s F, Fi, F 2 , and F 3 such that 

F = F i * F 2 Fi * Fi , 

while Fa ^ Fa. Another fundamental problem belonging to this order of ideas 
is to decide whether a given d.f. F is decomposable or not. F is called decom¬ 
posable, if there is at least one representation of the form F = F\ * Fi, where 
each component F v has more than one point of increase. So far, this problem 
has only been solved in very special cases, and the general problem still re¬ 
mains open for research. A particular case of some interest would be to know 
if there exists an absolutely continuous and indecomposable d f., such that 
F(a) = 0 and F(b) — 1 for some finite a and b 

As soon as we restrict ourselves to certain special classes of distributions, 
it is possible to reach results of a more definite character concerning the factori¬ 
zation problems. Some results of this type will be considered below. 


9. Closed families of distributions. The fact that certain families of dis¬ 
tributions are closed with respect to the operation of composition has played 
an important part m many applications. If F, and F 2 belong to a family of 
this character, so does the symbolical product F = Fi*F 2 We first give some 
simple examples of such families. 

The normal distribution. The d.f, F has the form F = </> 
cr > 0, and 


(x — m\ . 

I- 1 , where 


1 r x 

- vs L 

The c.f corresponding to F is 
m 2 and any positive ci , a 2 we have 


<r c ‘ 2 ' 2) dt. 

t 

and it follows that for any real 


where 


m = mi -f- m 2 , 


z i a 

= Cl + Cr 2 


The Poisson distribution. Here the d f. is F = F{x, X, m, a) where X > 0, 

\ r 

a ^ 0, and F is a step-function with a jump equal to - e - * in the point x = 

m + pa, where v = 0, 1, • • • The corresponding c.f. is e mia+X( ‘ a ’' _1 >, and it 
follows that for any fixed a we have 

Fix, Xi, wii, a) * F(x; X 2 , m 2 , a) = F(x, Xi + X 2 , mi + m 2 , a) 
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it. 


or r 

The Pearson Type III distribution, F = F(x , a , X) *= j 

( izY x 

(x > 0) The corresponding c, f is 11-1 , and for any fixed a > 0 and any 

positive Xi and X 2 wo have 

F(x, a, Xi) * F{x, a, X 8 ) = F(x\ a, \i + X 2 ). 

Stable distributions We shall say that a closed family is stable, when all 
its members are of the form F(ax + b), where F is a d.f., while a > 0 and b are 
constants. Obviously the normal family is an example of a stable family. It 
has been shown by L 6 vy and Kliintcliine [49], that a d.f. F(x) generates a stable 
family when and only when the logarithm of its c-.f. is of the form 


(9.1) 


log <p(z) = Pig — y | z T + li |Yj coj , 


where a, ft, y, S are real constants such that 
0 <«i 2 , y > 0 , 

while 

CMT 


a . I 1, 


tg- 


for 


; 1 <* I * I 

TV 


for a « 1. 


For a = 2 we obtain the normal family, 

A more general and very important closed family is the family I of infinitely 
divisible distributions, A cl.f. F belongs to I if to every n = 1 , 2 , • • • there 
exists a d.f. G such that F --= G M , where G [n] denotes the symbolical nth power 
of G Obviously the family I is a closed family which contains all the families 
mentioned above L£vy [GO], [63], has shown that F is infinitely divisible when 
and only when the logarithm of its c.f is of the form 


log <p(z) 


(9 2) 


= £« - yz 2 + (e' m - 1 - Y?—') 
•J-sa \ 1 “j- ii 2 / 


dM(u) 


+ r ( e“ u - 1 - r p~) 

Jo \ I-|- n 2 / 


dN(u), 


where (i and 7 > 0 are real constants, while M(u ) and N{u) are non-decreasing 
functions such that 

M{— co) = JV'(-j-co) = 0 , 

/ u 2 dM(u) < to and / u dN(u) < « 

J -a JO 

for any finite a > 0 When If and .V reduce to zero, we obtain the normal 
family. When 7 = 0 and one of the functions M and N reduces to zero, while 
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the other is a step-function with a single jump equal to X at the point x = a, 
we obtain a Poisson family. Generally, it follows from (9.2) that any i nfin itely 
divisible distribution may be regarded as a product of a normal distribution 
and a finite, enumerable or continuous set of Poisson distributions 
The representation of log <p(g ) in the form (0 2) is unique It follows that 
the problem of finding all possible factorizations of an infinitely divisible d.f. F 
can be completely solved, as long as we lestrict ourselves to factors which are 
themselves infinitely divisible. In fact, in order that 

F = F 1 *F 2 , 

where all three d f .’s belong to 7, it is necessary and sufficient that the logarithms 
of the corresponding c f.’s should bo of the form (9.2), with 

0 = ft + & , Y = 7i + ?s j 

M = M x + M», N = iVi + N 2 

In the two simple cases of the normal and the Poisson distributions, the 
decompositions obtained in this way remain the only possible, even if we remove 
the restriction that the factors should belong to I. Thus in any factorization 
of a normal distribution, all factors are noimal (Cramdr, [8]), while in any fac¬ 
torization of a Poisson distribution, all factors belong to the Poisson family 
(Raikov, [75]). For the type III distribution, and the non-normal stable dis¬ 
tributions, however, the corresponding property does not hold. 

In some cases, an infinitely divisible distribution may be represented as a 
product of indecomposable distributions, or as a product of an indecomposable 
distribution and another infinitely divisible distribution. The results so far 
obtained m this direction (L6vy, [G3], [64], Khintchine, [46], [47]; Raikov, [76]) 
are all concerned with more or loss particular cases, and the general factoriza¬ 
tion problem for infinitely divisible distributions still remains unsolved. A 
particular ease of some interest would be the case when the functions M and JV 
are both absolutely continuous. There does not seem to have been given any 
example of this type, where a factor not belonging to I may occur 2 
Finally we mention a general theorem clue to Khintchine, [46], which asserts 
that an arbitrary cl f. F may be represented in one of the forms 

F = G, F = H or F = G* II, 

where 0 is infinitely divisible, while II is a finite oi infinite product of inde¬ 
composable factors. This seems to be practically the only result so far known 
concerning the factorization of a general distribution. 

A certain number of the results mentioned above have been generalized to 
multi-dimensional distributions. 


‘While the present paper was being printed, I have proved that such factors do occur, 
as soon as at least one of the derivatives M' and N' is bounded away from zero in some 
interval (—a, 0) or (0, a). 
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10. The Laws of large numbers. In modern terminology, the classical 
Bernoulli theorem may be expressed in the following way. Let X\, Xi, ■ • • be 
a sequence of independent variables, such that each x„ may only assume the 
values 1 and 0 , the corresponding probabilities being p and q = 1 — p. Then 
the arithmetic mean 


( 10 . 1 ) 


2n _ Zl + • • • + Xn 

n n 


converges in probability to p, as n —> «>. 

Both classical and modern authors have laid down much work on the gen¬ 
eralization of this simple result in various directions. Generally, we shall say 
that a sequence of random variables Xi, a: 2 , ■ ■ • satisfies the Weak Law of Large 
Numbers if there exist two sequences of constants cq, a 2 , • ■ • and hi, b 2 , ■ • • , 
such that a n > 0 , and 

Z n — bn __ Xi + • ■ 1 + — frn 

d» 


converges in probability to zero 

Let xi, xi • 'be independent variables, such that x„ has the d.f. F,(x). 
It has been shown by Feller [27] that for any given sequence eq, , • • • , the 

conditions 


( 10 . 2 ) 


t, [ dF r (x) = o(l), 

y~l J |a|>a„ 


X) [ x 1 dFfix) - o(a\), 

v_l Jl*|<a„ 


are sufficient for the validity of the weak law of large numbers, and that the 
corresponding sequence lq , h 2 , • ■ ■ can be defined by 


!>, - 


11 
'“l J | I I <“7, 


x dF v {x). 


When there is a constant c > 0 such that for all v 


(10.3) F„(+ 0 ) > c, F„( —0) < 1 — c, 

the conditions are also necessary This theorem contains as particular cases 
all previously known results m this direction. A simple NS condition for the 
existence of at least one sequence cq, cq, • such that 10.2 holds does not seem 
to be known. 

When the weak law is satisfied, this means that, for any given e > 0 and for 
any fixed large n, there is a probabihty very near to 1 that the sum s n = xi + 
• • + Xn will fall between the limits b n ± m„ . The more stringent condition 

that, with a probability tending to 1 as n —> , z„ will fall between the limits 

b v ± ta, for all values of v g n is equivalent to the condition that —-- con- 

a n 
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verges almost certainly to zero. When this holds, we shall say that the variables 
x y satisfy the Strong Law of Large Numbers The most important result so far 
known in tins connection is concerned with the case a n = n, and is expressed 
by the following theorem (Kolmogoroff, [52], [55]): 

When the are independent and (10.3) holds, a sufficient condition for the valid¬ 
ity of the strong law with a n = n consists m the simultaneous convergence of the 
two series 



Some improved conditions of this type have been given by Marcinldcwicz 
and Zygmund, [73], but the problem of finding a NS condition for the strong 
law is still unsolved, even in the case a n = n 
Important generalizations of the laws of large numbers to cases when the 
x , are not assumed to be independent have been given l.a. by Khintchine [44], 
ITvy [62], [63] and Lofeve [67] 

11. The central limit theorem and allied theorems. It was already known 
to De Moivrc that, in the case 10.1 of the Bernoulli distribution, the d.f. of 
the normalized sum 


fti H- + x n —np 

V npq 

tends, as n —+ °°, to the normal d.f. cf>(x) Considerably more general results 
in this direction were stated by Laplace. After a long series of more or less 
' successful attempts, a rigorous proof of the main statements of Laplace was 
given in 1901 by Liapouncff, [65], More general cases were later considered i.a. 
by Lindeberg [66], Ldvy [61], [63], Khintchine [43] and Feller, [25]. The follow¬ 
ing final form of the Central Limit Theorem is due to Feller 
Consider the expression 

_ %n h n _ Xl T ' 1 ' d - X n bn 

(11.1) u n — - — - j 

&n &n 

where the x y are independent variables. We shall say that the x v obey the 
central limit law, if the sequences {a„} and {b y } can be found such that the 
d.f. of u n tends to <f>(x) as n —* . In order to avoid unnecessary complica¬ 

tions, we shall restrict ourselves to sequences {a y } such that 


—> + 


a>, + 1 i 

Cby 


and we shall assume that the conditions (10.3) are satisfied Then Feller's 
theorem runs as follows: 
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The independent variables Xi , x %, • obey the central limit law if, and only if, 

there exists a sequence q n —* 00 such that simultaneously 


( 11 . 2 ) 


E f dF r (x) -> 0, 

4 E f * dF v (x) —> ® . 

v«=l 


When these conditions are satisfied, explicit expressions for the a n and b n can be 
obtained. 

Feller’s theorem gives a complete solution of the problem. However, we 
might still try to express in a more direct way the condition that the q n should 
exist. We may also ask what happens when the conditions (11.2) are not 
satisfied. Some particular cases of the latter question will be considered below. 
However, very few general results aie known in this diiection 

The central limit theorem has been extended in various directions. Bern¬ 
stein [3], L6vy [62], [63], Lohve [67] and others have considered cases where the 
x, are not assumed to be independent Important results have been reached 
but still much remains to be done 

On the other hand, seveial authors have considered symmetrical functions, 
other than sums, of n independent random variables. The problem of investi¬ 
gating the asymptotic behaviour of the distributions of such functions, as n 
tends to infinity, is of great, importance in the theory of statistical sampling 
distributions. It is known (cf. eg Cramer, [15]) that under certain general 
regularity conditions there exists a normal limiting distribution. However, it 
is also known that it is possible to give examples of particular functions (such 
as e g the function which is equal to the largest of the n variables), where there 
exist limiting distributions which are non-normal The conditions under 
which this phenomenon may occur seem to deserve further study. 

A further problem belonging to the same order of ideas is to find a closer 
asymptotic representation of the d.f. of the standardized sum z n than that pro¬ 
vided by the normal function <f>(x) Consider e.g. the simple case when the x v 
are independent variables all having the same d.f. F(x) with a finite mean m, a 
finite variance c 2 , and finite moments up to a certain order fc £ 3 Let G n (x) 
be the d.f of the variable 


Xi + + x n — nm 

<r\/ n 

It then follows from a theorem of Cram dr [5], [9] that, as soon as the d.f. Fix) 
contains an absolutely continuous component, there is an asymptotic expansion 

(11.3) <?.(*) = *(*) + E e~‘ m + CKr-'*- 21 ' 2 ), 

P=1 

where the constant implied by the 0 is independent of n and x Cram6r has 
also given similar expansions m more general cases, and his results have been 
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further extended by P. L. Hsu [39], who deduces analogous expansions also for 
other functions than sums The most general conditions under which expansions 
of this type exist aie still unknown 

It follows from (11 3) that the difference G„ (m) — <f>(x) is, for any fixed a, 
of the order nT* as n —> «. It is often important to know the asymptotic 
behaviour of G„(x) when n and x increase simultaneously, and m that case (11,3) 
yields only a trivial lesult This case has been investigated by Cramdr [10], 
and Teller [29], and the results so far obtained permit important applications to 
the so called law of the iterated logarithm (cf. below). However, it seems likely 
that similar results may be obtained in considerably more general cases than those 
hitherto investigated, 

A further interesting type of problems belonging to this order of ideas may 
be approached in the following way, Consider the variables (11 1) in the par¬ 
ticular case when X\ , , • • • are independent variables all having the same 

d.f F(x). When the o,„ and b n can be found such that the d.f, of the normalized 
sum u n tends to <t>(x), we shall say that F belongs to the domain of attraction of 
the normal law. Feller’s theorem gives a NS condition that this should be so 
Now when this condition is not satisfied, it may still occur that the a n and b n 
can be so chosen that the cl f. of u n tends to a limiting d f ~i'{x), which is neces¬ 
sarily different fiom 4>(x). Then it is easily seen that 'I' (x) must be a stable 
distribution, with its c f. defined by (9.1), and it is natural to say that F belongs 
to the domain of attraction of T NS and sufficient conditions that this should 
hold have been given by Doeblin [16], and Gnedenko [34], When the u„ and 
b n cannot be found such that the d.f of the normal sum w„ converges to a limit, 
it may still be possible to obtain a limiting d.f. by considering only a partial 
sequence u ni , u,u , • • Khintchme [47] has proved the interesting theorem 
that, the totality of limiting d f’s that may be obtained in this way coincides 
with the class of infinitely divisible d.f.’s defined by (9.2). There are also 
furthei results in the same direction given by Bawly [2], Khintchine [44], Ldvy, 
[61]-[63], and Gnedenko, [35], 

12. The law of the iterated logarithm. Consider a sequence of independent 
vaiiables x^ , x 2 , • • , such that the mean Ex n = 0 for all n, while the variances 
Ex\ = <r\ are finite. Put s 2 „ = ou + ■ ■ + , and suppose that the variables 

obey the central limit law with a n = s„ , b n = 0. (In particular this will be 
the case when all x n have the same distribution.) For any function f/(n) tending 
to infinity with n we then have 

(12.1) lim P(| | > s„t l/(n)) = 0. 

On the other hand, if <^(n) tends to a finite limit > 0, the same probability 
has a positive limit, 

It seems natural to consider the relation within the brackets in (12,1) not 
qnly for a single large value of n, but to require the probability that this relation 
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holds simultaneously for an infinite nunibei of values of n. The development 
of this problem has led to the so called law of the iterated logarithm. 

We shall in this respect use the following terminology due to L6vy. A non¬ 
decreasing positive function will be said to belong to the lower c lass with 
respect to the variables x n if, with a probability equal to one, there are infinitely 
many n such that 

| z„ | > s n fi(u)- 

On the other hand, 1 J/(n) will be said to belong to the upper class if the prob¬ 
ability of the same property is equal to zero. 

Every \p{n) belongs to one of these two classes. This is a special case of the 
so called null-or-one law : if <S is a Borel set in the space of the independent random 
variables xi , .r 2 , • • , such that any two points differing at most m a finite num¬ 
ber of cooidinates either both belong to S or both belong to the complementary 
set, then P(S) can only assume the values 0 or 1. 

It was proved by Kolmogoroff [51] that, subject to certain restrictions, the 
function 

' P(n) = Vc log log s„ 


belongs to the lower class for any c < 2, and to the upper class for any c > 2, 
which may be expressed by the relation 


( 12 . 1 ) 


P (lim sup 


S* V2 log log S n 



More general results were proved by Feller [30], who proved i a that, subject to 
certain restrictions, 'f(n) belongs to the lower or upper class according as , 


( 12 . 2 ) 


V 


2 




is divergent or convergent (in certain special cases, this had been previously 
found by Kolmogoroff and Erdos [24] Feller also proved a more compli¬ 
cated result, which contains the above as a particular case, and from which 
it follows that the simple criterion (12 2) no longer holds when the restrictions 
imposed m its proof are removed. 


13, Convergence of series. For any sequence of random variables x n , the 
probability 


'(?*■ 


converges 


has a uniquely determined value When the x n are independent, it follows from 
the null-or-one law that this probability is either 0 or 1. By a theorem of 
Khintchine and Kolmogoroff [48], the value 1 is assumed when and only when 
the three series 

Z[ dF n , £%„, £ a 2 y n 
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are convergent, where 

x„ when j x n | g 1. 

J/n = 

0 when | x n \ > 1. 

For the case when the x n are not assumed to be independent, various results 
have been given by L6vy [63] and others, but our knowledge of the properties 
of these series is still not very advanced. 

14. Generalizations. In several instances it has been pointed out above 
that the results concerning sums of independent variables may, to a certain 
extent, be extended to cases when the variables are not independent. Generally 
the independence condition has then to be replaced by some condition restricting 
the degree of dependence Results of this type were first give by Bernstein 
[3], and then in more general cases by L6vy [62], [G3], and Lohve [67] However, 
this field has so far only been very incompletely explored 

Similar remarks apply to the generalization of the various theorems quoted 
above to cases of variables and distributions in moie than one dimension 

III. STOCHASTIC PROCESSES 

16. The theory of random variables in a finite number of dimensions is able 
to deal adequately with practically all problems considered m classical prob¬ 
ability theory. However, during the early years of the present century, there 
appeared in the applications various problems, where it proved necessary to 
consider probability relations bearing on infinite sequences of numbers, or even 
on functions of a continuous variable. 

The mathematical set-up required for the study of such problems involves 
the introduction of probability distributions in spaces of random sequences or 
random functions (cf 5 above). Generally, any process in nature which can be 
analyzed m terms of probability distributions in spaces of these types will be 
called a stochastic process It is convenient to apply this name also to the prob¬ 
ability distubution used for the study of the process We shall thus say, e.g., 
that a certain random function x{t) is attached to the stochastic process which 
is defined by the probability distribution of x(t) In the majority of applica¬ 
tions, the variable l will represent the time, and we shall often use a terminology 
directly referring to this case However, there are also other types of problems 
m the applications (< may e.g. be a spatial variable in an arbitrary number of 
dimensions), and it is obvious that the purely mathematical problems connected 
with these classes of probability distributions will have to be considered quite 
independently of any concrete interpretation of the variable t or the funcion x(t). 

A well-known example of this type of problems is afforded by the Brownian 
movement Let x{t) be the abscissa at the time t of a small particle immersed 
m a liquid, and subject to molecular impacts. In every instant, the quantity 
x(() receives a random impulse, and the problem arises to study the behaviour 
of x(t). According as we are content to consider x(t) for a discrete sequence 
of ^-points, say for t = 0, 1,2, • • ■ , or we wish to consider all positive values of t, 
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wc shall then have to introduce a probability distribution in the space of the 
random sequence s(0), m(l), ■ , or in the space of the random function x(t), 

where t > 0 We may then discuss such questions as the distribution of x(t) 
for a given value of t, the joint and conditional distributions of x(t) for two or 
more values of t, and, in the case of a continuous variable l, continuity, differen¬ 
tiability and other similar properties of the random function x(t ) 

Wiener [82], [83] (cf also Paley and Wiener [74]) was the first to give a rigorous 
treatment of this process. Ho proved in 1923 that it is possible to define a 
probability distribution m a suitably restricted functional space, such that the 
increment A x(t) = %{t At) - r(t) is independent of x(t) for any At > 0. 
With a probability equal to 1, the function x(t) is continuous for all L > 0, and 
for any fixed it > 0, the random variable x(t) is normally distributed. 

Another e xam ple of stochastic processes studied at this stage occurs in the 
theory of risk of an insurance company Let x{t) denote the total amount 
of claims up to the time l in a certain insuianco company. As in the case of 
the Brownian movement, it may seem natural to assume that the increment 
A x(t) is independent of x(t) On the other hand, x(t ) is m this case an essen¬ 
tially discontinuous function, which is never decreasing, and increases only by 
jumps of varying magnitudes occurring for certain discrete values of t, which are 
not a priori known Processes of this type were studied by F. Lundberg [69], 
[70], H Cramdi [6] and others. 

Further examples of particular processes were discussed m connection with 
various applications, but no general theory of the subject existed until 1931, 
when Kolmogoroff published a basic paper [53] dealing with the class of stochastic 
processes which will here be denoted as Markoff processes (Kolmogoroff uses the 
term “stochastically definite processes”), of which the two examples mentioned 
above form particular cases. The theory of this class of processes was further 
developed by Feller [26], [28] In 1934, Khintchme [42] introduced another 
important class of processes known as stationary processes. From 1937, the 
general theory of the subject was subjected to a penetrating analysis in a series 
of important works by Doob [18]—[22] 3 

16. Probability distributions in functional spaces. We have seen in 6 
above how a probability distribution in the space of all functions x(l) may be 
defined, when t vanes in an arbitrary space T Generally, we shall here con¬ 
tent ourselves to consider the cases when T is the set of all real numbers, or the 
set of all non-negative real numbers. Most results obtained for theso cages 
will be readily generalized to cases when t varies in a Euclidean space of a finite 
number of dimensions. On the other hand, when T is enumerable, say consist¬ 
ing of the points t = 0, ±1, ±2, • ■, so that we are concerned with a random 
sequence x(0), .'c(rfcl), , the results for the continuous case will generally 
hold and assume a simpler form which will not be particularly stated here 

J A further interesting paper by Doob has appeared while the present paper was being 
printed "Probability in function space”, Bull Amur, Math Sob., Vol. 53 (1947), pp 15-30. 
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The case when T is a space of an infinite number of dimensions does not seem 
to have been considered so far 

In the present paragraph, it will be convenient to assume the function x(t) 
to be real-valued, but the generalization to a complex-valued x(t ) requires 
only obvious modifications In the sequel we shall sometimes consider the 
real-valued and sometimes the complex-valued case, according as the occasion 
requires 

Let now X be the space of all real-valued functions x(t ) of the real variable 
t, where — «> < t < « . According to 5, a probability measure P(S) is uniquely 
defined for all Borel sets S in X by means of the family of joint distributions 
of all finite sequences x(h), ■ ■ , a;(/„). In fact, P{S ) can be defined for a more 
general class of sets than the Borel sets For any set S in X, we may define 
an outer P-measure P(S) as the lower bound of P(Z) for all sums Z of finite or 
enumerable sequences of intervals, such that S Cl Z. Further, the inner P- 
moasure P(jS)is defined by the relation P(S) = 1 — P(X — S ). When the 
outer and inner measures are equal, iS is called P-measurable, and P(S) is defined 
as their common value Any P-measurablo set differs from a Borelset by a 
set of P-measure zero. 

In many cases, this definition will be sufficient for an adequate treatment 
of the problems that we wish to consider However, in o fcher cases we encounter 
certain characteristic difficulties, which make it desirable to consider the pos¬ 
sibility of amending the basic definition Thus it often occurs that we are 
interested m the probability that the random function x(t) satisfies certain 
regularity conditions in a non-enumerablc set of points t. We may, e.g , wish 
to consider the probability that x(t) is continuous for all t, that %{t ) should 
be Lebesquo-measurable for all t, that x(l) g k for all t, etc Let S denote the 
set of all functions satisfying a condition of this type. It can then be shown 
that the inner measure P(S) is always equal to zero so that S is never measur¬ 
able, except m the (usually trivial) case when P(S) = 0 

Consequently many interesting probabilities are left undetermined by the 
general definition of a probability distribution m X given above. The pos¬ 
sibility of modifying the definition so as to enable ns to study probabilities of 
this type has been thoroughly investigated by Doob [18]. He considers a 
subspace X a of the general functional space X, where X 0 is chosen so as to 
contain only, or almost only, “desirable” fiinctions, l.c. functions satisfying 
such regularity conditions as seem natural with respect to the problem under 
investigation. We start from a given probability measure P(S) m X, and ask 
if it'is possible to define a probability measure in the restricted spaco X a , which 
corresponds in some natural way to the given distribution in X. Let So be 
a set in X 0 , and suppose that it is possible to find a P-mcasurable set S in X 
such that SX 0 = So ■ According to Doob, a probability measure P 0 in X 0 
is then uniquely defined by the relation 

Po(So) = P(S) 
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if and only if the condition 

F(X 0 ) = 1 

is satisfied. 

The problem is thus reduced to finding a subspace X 0 of outer P-measure 1, 
such that X 0 contains only functions of sufficiently regular behaviour. When 
this can be done, wo can restrict ourselves to consider only functions x(t) be¬ 
longing to X 0 , the probability distribution in this space being defined by the 
measure P 0 We shall then say that x(t) is a random function, attached to a 
stochastic process with the restricted space X 0 . Doob has obtained a great 
number of interestuig results in this connection, e g with lespect to the problem 
of choosing X 0 such that it contains almost only Lebesque-measurable functions, 
or such that the probability of the relation x(t) g k has a well-defined value for 
all k In particular he has shown that the last problem can be solved for 
any given P-measure. However, our knowledge of the various possibilities 
which exist with respect to the choice of X 0 is still very incomplete, and it, seems 
likely that further important results may be reached along this line of research. 

An alternative method of introducing probability distributions in functional 
spaces has been used by Wiener [82], [83], (cf. also Paley and Wiener, [74]) 
Consider a given probability measure II in an arbitrary space fi, defined for all 
sets 2 of an additive class C. Let x(t, u) denote a function (real- or complex- 
valued, as the case may be) of the arguments t (real) and u (point m U ), such that 
x(t, u) for every fixed t becomes a C-measurable function of u. On the other 
hand, when w is fixed, x(t, w) = x(t) reduces to a function of the real variable i. 
Let X 0 denote the set of all functions x(l) corresponding in this way to points of 
Q. Further, let So = SX 0 , where S is a Borel set in X, and let 2 denote the set 
of all points to such that x(l, to) C S 0 . Then 2 belongs to C, and a probability 
measure P 0 in the functional space X 0 is uniquely defined by the relation 

(16.1) Po(So) = n(2). 

The relations between the two modes of definition have been discussed by 
Doob and Ambrose [23] who have shown that they are largely equivalent 
However, it seems likely that in particular problems the one or the other pro¬ 
cedure may sometimes be the more advantageous, and further investigations 
on this subject seem desirable 

17. Processes with a finite mean square. Consider a stochastic process 
defined by a probability measure P(S) in the space X of all complex-valued 
functions x(t) of the real variable t. For any fixed t 0 , the random variable 
*(<o) is then a complex-valued function of the variable point x(t) in the space 
X, i.e. a point Q, a in the space 0 of all complex-valued functions defined on X. 
When to varies, the point Q (o describes a “curve 1 ' in fi, which then corresponds 
to our stochastic process. 
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Suppose, in particular, that the mean square 

E\x(t) [ 2 = f | x{t) 1 2 dP 

is finite for any fixed value of t. This implies that for fixed l the function 
x(t) belongs to L 2 over X, relative to the probability measure P. The random 
variable x(t) may then be regarded as an element of the Hilbert space H of all 
complex-valued functions / belonging to L 2 over X, the inner product (/, cj) of 
two elements / and g being defined by the relation 

(f,g) = [f§dP = E(fg). 

■>x 

The stochastic process to which x(t) is attached then corresponds to a “curve” 
m H (Kolmogorciff, [56], [57]), so that the well-known theory of Hilbert space is 
available for the study of the process. In particular, convergence in the usual 
metric of Hilbert space is equivalent to convergence in the mean of order 2 for 
random variables. 

Let H x be the smallest closed linear subspace of H which contains all elements 
of the form aix(k) + ■ + a n x(Q. If the covariance function 

r(t, u) = (aOO, ®(u)) = E(x(t)xfu)) 

is continuous for all real values of t and u, then %(t) —> x(t 0 ) in the mean, as 
t —► U , and we shall say that the process x(t) is continuous. Tor any continuous 
process, Ii x is separable When g(t) is a continuous non-random function of t, 
and x{t) is attached to a continuous stochastic process, the Riemann-Darboux 
sums formally associated with the integral 

f g(t)x(t) di 
•*(1 

are easily shown to tend to a limit y, which is an element of Ii x , i.e. a random 
variable By definition, we may identify the integral with this variable y, 
and this integral will possess the essential properties of the ordinary Riemann 
integral (Cram 6 r, [ 12 ]). 

The application of the theory of Hilbert space to stochastic processes seems 
to open very interesting possibilities. Some applications to particular classes 
of stochastic processes will be mentioned below. Futher important results be¬ 
longing to this order of ideas will be given in a work by K. Karhunen [40], which 
is in course of publication. 

18. Relations to ergodic theory. There is a close connection between the 
theory of stochastic processes and ergodic theory In ergodic theory, as sum¬ 
marized e.g. in the treatise of E. Hopf [38], we consider an arbitrary space fl, 
and a probability measure H, defined for all sets 2 belonging to the additive 
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class G. We further consider a one-parameter group of one-one transformations 
of [2 into itself (a "flow” in fl) such that the transformation corresponding to 
the parametei value t takes the point u = to 0 into u t , while (ut) u = co (+u . Let 
/(w) be a given function, defined throughout Q, and such that f(on) is C-measur- 
able for every fixed t The well-known ergodic theorems due to von Neumann, 
Birkhoff, Khintchine and others are then concerned with the asymptotic 
behaviours of mean values, which in the classical cases are of the types 

/(mii) + f(u 1) + • + /(o«—1) 

n 
or 

as n or T tends to infinity. (In the case of the latter expression, it is necessary 
to introduce some additional condition implying measurability in t ) 

Writing x(t, u) = /(« 4 ), it is seen that to a given transfoimation group u->u, 
and a given function/(u), there corresponds a stochastic process in the sense of 
Wiener’s definition (cf 16) The space X 0 of this process consists of all functions 
x(t) representable in the form x(t) = /(an), ivhen = u 0 varies over B. The 
corresponding probability measure P 0 is defined by (16.1) 

Thus any of the above-mentioned ergodic theorems may be expressed as a 
theorem concerning “temporal” mean values of the types 

x(0) + z(l) + ■ ■ ■ + x(n - 1) 
n 
or 

i r T 

T Jo < ^' 

If, according to some reasonable convergence definition, we may assign a limit 
to either of these expressions, as n or T tends to infinity, this limit will be a 
random variable, and it is important to find conditions which imply that this 
variable has a constant value for “almost all” functions x(t), i.e tor all x{t) 
except at most a set of Po-measure zero. 

In the particular case when x(0), x(l), • • • are independent variables all 
having the same distribution, the classical ergodic theoiems yield simple cases 
of the laws of large numbers (cf. 10) The mean ergodic theorem of von Neu¬ 
mann gives the weak law, while the Birkhoff-Khintchine theorem gives the 
strong law. Some more general results belonging to this order of ideas will be 
mentioned in the sequel. 

It will be seen that the two theories are largely equivalent, and it seems 
likely that further comparative studies of the methods will be of great value to 
both sides. 

19. Markoff processes. Consider now a stochastic process, defined by a 
probability measure P{S) in the space X of all real-valued functions x(l) of the 
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real variable t For any < k, there is a certain conditional probability 
Pfa(k) C S | x(k) = ai) of the relation x(k) C S, relative to the hypothesis 
that x(ti) assumes the given value cu . Suppose now that this conditional prob¬ 
ability is independent of any additional hypothesis concerning the behaviour of 
x(t) for t < k, so that we have e g. for any k < k < k and for any a 0 

P(x(U) C S | x(k) = ax) = P(x{k) C S | x(k) = ax , x{Q = a 

In this case the process is called a Markoff process 
The general theory of this type of processes, which forms a natural gen¬ 
eralization of the classical concept of Markoff chains, has been studied m basic 
works by Kolmogoroff [53] and Feller [26], [28] Writing 

P(x(t) g £ | x(t 0 ) = o 0 ) = F(£; t, do, k), 

where k < l, F will be the distribution function of the random variable x(t), 
relative to the hypothesis x(k) = «o Then F satisfies the Chapman-Kol- 
mogoroff equation 

(lfi-1) F(£; t, Go, to) = f F(£; t, i), k) d v F(n, k, oo, k), 

oo 

winch expresses that, starting from the state x(t„) — a 0 , the state x{t) g £ 
must be reached by passing through some intermediate state x(k) = tj, where 
to < k < t. Subject to certain general conditions, it is possible to show that 
any solution of this equation satisfies certain integro-differential equations, 
which in some important cases reduce to partial differential equations of para¬ 
bolic type, and that the d f F is uniquely determined by these equations. How¬ 
ever, the general conditions mentioned above are in many cases difficult to apply 
to particular classes of processes, and it would be important to have further 
investigations concerning these questions. 

Markoff processes (not belonging to the subclass of differential processes, 
which will be considered in the following paragraph) appear in several important 
applications, e.g. in the theory of cosmic radiation, in ceitain genetical problems, 
in the theory of insurance risk etc. In these cases, we are often concerned with 
the class of purely discontinuous Markoff processes, where the function x(t) 
only changes its value by jumps If, in addition, there are only a finite or 
enumerable set of possible values for x(t), the Chapman-Kolmogoroff equation 
(19.1) reduces to 

(!9-2) 1Tik(to , f) = £ ira(to, ti)T, h {k, t), 

1 

where u-,* (t a , t ) denotes the “transition probability”, i e. the probability that 
x{t) will be m the fcth state at the time l, when it is known to have been in the 
rih state at the time f 0 . In matrix form, this equation may be written 

09 3) n(f 0 ,0 = n(<o, <i)n(h, f), 

where n denotes the matrix of the iru.. 
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When only a sequence of discrete values of t are considered, we have here 
the classical case of Markoff chains, which has received a detailed treatment 
in the well-known book by FMchet [32] (cf. also Doob, [19]). The case when t 
is a continuous variable has been treated by Feller [28], 0 Lrmdberg [71], 
Arley [1], and other authors. Some of the most important problems of this 
branch of the subject arc concerned with the existence of a unique system of 
solutions of (19 2) or (19.3), and with the asymptotic behaviour of the solu¬ 
tions for large values of t — t Q . Though important results have been reached, 
there still remains much to be done here, and the same thing holds a fortiori 
with respect to the analogous problems for general Markoff processes. 

20. Differential processes. A particularly interesting case of a Markoff 
process arises when, for any At > 0, the increment Ax(t) = x(t + At) — x(t) 
is independent of rc(r) for rSi The process is then called a differential process. 
Some of the earliest studied stochastic processes belong to this class, which 
contains m particular the two examples discussed above m 15. Further cases 
of such processes arise e g m the theory of radioactive disintegration and in 
telephone technique. 

Let us suppose that x(0) is identically equal to zero, and that the process is 
uniformly continuous in probability in every finite interval 0 g t S T, i.e. 
that for any fixed positive t 

P{ | x(t + At) — %(t) | > e) —> 0 

as At —> 0, uniformly for 0 g t ^ T -Then it follows from the works of L6vy, 
[60], [63], Khintchine [47] and Kolmogoroff [54] that, for any t > 0, the random 
variable x(t) has an infinitely divisible distribution, with a characteristic func¬ 
tion (p{t) t ) given by (9 2 ), where 8 , 7 , M(u) and N{u) may depend on t. 

In the particularly important case when the distribution of the increment 
x{t + At) = x(t) does not involve t, but only depends on the length At of the 
interval, we say that the process is temporally homogeneous, and in this case 
we have 


log <p(z-, t ) = t log ip(z; 1 ), 

so that we obtain the general formula for p(z; f) simply by replacing in ( 9 . 2 ) 
£), 7 , Miff) and N(u) by i/3, h, tM{n) and tNff) respectively, 

When t —*■ co, or t —> 0, the appropriately normalized distribution of x{t) 
tends, under certain conditions, to a stable distribution (Cram 6 r [7], Gne¬ 
denko [36]) When this limiting distribution is normal, there are sometimes 
even asymptotic expansions analogous to (11.3). Still, the problem of the 
asymptotic behaviour of the distribution for large t does not seem to be definitely 
cleared up. 

Khintchine [41] and Gnedenko [37] have given interesting generalizations 
of the law of the iterated logarithm (cf. 12 ) to processes of the type considered 
here. 
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The continuous process discussed m 15 in connection with the Brownian 
movement corresponds to the temporally homogeneous case when / 3 , M(u) and 
N(u) all reduce to zero, so that 

*>(*) = <T 7 ‘ 2 \ 

which shows that the distribution of x(t) is normal, with mean zero and vari¬ 
ance 2 yt 

On the other hand, in the applications to the theory of insurance risk, y is 
zero, while M(u) and N(u) are connected with the distribution of the various 
magnitudes of claims. In this type of applications, it is often very important 
to find the probability that x(t) satisfies an inequality of the form 

x(t ) < a + l>t 

for all values of t. It follows from the discussion in 16 that the definition of 
a probability of this type is somewhat delicate. The problem, which can be 
regarded as an extended form of the classical problem of “the gambler’s ruin,” 
has been solved in certain particular cases. It leads to integral equations, 
which in the simplest case are of the Yolterra, in other cases of the Wiener- 
Hopf type (Cramdr [ 6 ], [13], Segerdahl [79], Tacldind [81]) 

21. Orthogonal processes. Consider now the case of a complex-valued 
a;(<), and suppose that E \ x(l) | 2 is finite for all t Without rcstucting the gen¬ 
erality, we may assume that Ex(l) = 0 for all t. 

Suppose now that instead of requiring, as in the case of a differential process, 
that the variables x(r) and A x(t) should be independent when r | we only 
lay down the less stringent condition that, these variables should be non-cor- 
related, i.e. that 

E(x{r)Ax(t)) = 0. 

We then obtain a process which is no longer necessarily of the Markoff type. 
The condition implies that, for any two disjoint intervals ( k , U) and (t 3 , U), 
we have 

E[(x(t 2 ) - aj(fi))(x(t 4 ) - z(f 3 ))] = 0, 

so that the “chords” corresponding to' two disjoint “arcs” of the curve in 
Hilbert space representing the process are always orthogonal (Kolmogoroff 
[56], [57]). A process of this type may accordingly bo called an orthogonal 
process. 

For a process of this type wc have, writing E \ x(i) | 2 = F(t), F{L + At) — 
F(t) = E | x(t + At) — x(t) | 2 , so that F(t) is a never decreasing function of f. 
If F(i) is bounded for all t, we shall say that the orthogonal process is bounded. 
For a bounded orthogonal process, the Stieltjes integral 
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where g(l) is bounded and continuous, may be defined as the limit in the mean 
of sums of the form 

!/(b) (■!'(/[,) x{t V —l) ) * 

V 

22. Stationary processes. When we are concerned with a process representing 
the temporal development of a system governed by laws which are invariant 
under a translation m time, it seems natural to assume that the joint distri¬ 
bution of any group of variables of the form 

(2$ 1 ) x(k + t), ■ , x(t n + t) 

is independent of t. A process satisfying this condition will be called a sta¬ 
tionary process If a stochastic process is defined by means of a “flow” co —s- u t 
in a space 12 (cf. 18), the process will be stationary when and only when the 
corresponding flow is measure-preserving , i e. if the transformation w —> a t 
changes any C'-measurable set S into a set S t of the same measure 
Under appropriate conditions with respect to the measurability of x(t), the 
Birkhoff-Khintchme ergodic theorem holds for a stationary process, i.e there 
exists a random variable y such that we have 

,(22 2) Po ^lim i jf :c(t) dt = yj = 1 , 

T-+ oo 

where Po is the probability measure in a suitably restricted space in the sense 
of Doob. Further work seems to be required here, in order to make the situa¬ 
tion quite clear, also with regard to metric transitivity 
For a stationary process, any finite moment of the joint distribution of the 
variables (22 l) is obviously independent of r. Suppose now that we only re¬ 
quire that this invariance under translations m time should hold for moments 
of the first and second order of the joint distributions, which are assumed to 
be finite The wider class of processes obtained in this way may be called 
stationary of the second order. Processes of this type have been studied for the 
first time by Khintchine [42] We shall assume that x{t ) is complex-valued. 
Without restricting the generality,_we may further assume that Ex{t) = 0 for 
all t. The product moment E(x(t)x{u)) will then he a function of the difference 
t — u: 

(22 3) E(x(t)x(u) = R(t — u). 

Assuming, in addition, that R{t) is continuous at t = 0, it follows that R(i) 
is continuous for all t, and the process is continuous m the sense of 17. It was 
shown by Khintchine that a NS condition that a given function R(l) should 
be associated with a second order stationary and continuous process by means of 
the relation (22.3) is that we should have 

R(t) = f e' iz dF{x) 


( 22 . 4 ) 
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for all t, where the spectral function F(x) is real, never decreasing and bounded 
In paiticular, we have. 

F{+*) - F(- co) = fi( 0 ) = E | x(t) | 2 = <r 2 . 

Khmtchine’s condition for 7?(/.) was generalized by Crambr to the ease of an 
arbitrary number of processes ai(7), • • • , a;„(£), such that the product moments 
E{x,{t)x^u}) are functions of the difference L — u The corresponding spectral 
functions F,j(x) are in general complex-valued and of bounded variation Fur¬ 
ther, the expression (Cranidr, [12]) 

ft 

£ ZiZjAF,,, 
t,]-.i 

where A Fi, = F l] (b) — F t] {a) is, for any a < b, a non-negative Hermite form in 
the variables . This result is closely connected with a theorem on Hilbert 
space considered by Kolmogorolf and Julia. It is further shown that, to any 
given functions F tj(x), (t, j = 1, • • • , n), satisfying these conditions, we can 
always find n processes Xi(t ), • , *»(t) such that the joint distribution of any set 

of variables is always normal, while the covariance functions R, 3 (t — u) = 
E(x,(t)x,(u)) are given by the expression 

R„(0 - iy*dF tl (x). 

For a process x(t) which is continuous and stationary of the second order, 
with Ex(t) = 0 for all t, ive have the mean ergodic theorem 

(22.5) l.i.m ^ f T e~ M x(t)dt = y 

for any real X The random variable y has the mean 0 and the variance F(A 4- 0) 
- F(X — 0), where F is the spectral function appearing in (22.4) If X is a 
point of continuity for F, it thus follows that y = 0 with a probability equal 
to 1. On the other hand, if X is a discontinuity, y has a positive variance. Let 
Xi, X 2 , be all the discontinuities of F(x), and let <rl, <4 , • ■ ■ be the cor¬ 
responding saltuses, while i/i, y 2 , ■ ■ are the limits in the mean obtained from 
(22 5) for X = Ai , X 2 , • •. Then two different y, are always orthogonal: 
E(y,y, e ) = 0 for j 5 ^ k, and we have 

( 22 . 6 ) z® = T.V.t* + m, 

V 

where E%(t) = 0 and 

E\M iW-Ea 2 ,. 

v 

If F(x) is a step-function, we have <r = <4. and it follows that £(£) = 0 

with a probability equal to 1, so that (22.6) gives a “stochastic Fourier expan¬ 
sion” of x(t) (Slutsky, [80]). 
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Even when F(x) is arbitrary, we can obtain a spectral representation of x(t) 
generalizing (22.6) In fact, it can be shown (Cramdr, [14]) that x(i) can always 
be represented by a Fourier-Stieltjes integral 

(22 7 ) x(i) = f e' tu dz{u), 

*'—00 

where z{u ) is a random function attached to a bounded orthogonal process 
(cf. 21 ), such that 

E | z(u + A u) — z(u) | 2 = F(u + Am) — F(u) 

Conversely, we have 

/ oo — iJ(u+Au) —itu 

- --— - x(t) dt } 

so Jj'KZL 

so that there is a one-one correspondence between x(t) and A z(u). The integrals 
(22 7) and (22 8 ) are defined as limits in the mean, as shown above in 17 and 21. 
These results are m close correspondence with generalized harmonic analysis for 
an arbitrary function, as developed by Wiener [83] and Bodmer [4]. The spec¬ 
tral representation of a stochastic process has important applications, some of 
which will be considered in a forthcoming paper by Karhunen [40]. An exten¬ 
sion of the spectral representation to a more general class of processes has been 
given by Lohve [ 68 ] 

When, m particular, the x(t) process is such that the joint distribution of any 
group of variables .-r(h), • ■ , x(t n ) is normal, it follows that any increment 

Az(m) is normally distributed. Since two uncorrelated normally distributed 
variables are always independent, it follows that m tins case the g(w) process 
is a differential process with normally distributed increments. Important 
results for this case have recently been given by Doob [ 22 ]. 

The properties of continuity, differentiability etc for processes of the type 
here considered are still incompletely known, and further work is required. 
A further group of important unsolved problems are connected with an inter¬ 
esting decomposition theorem by Wold [84], which holds for processes with 
a discrete time variable The generalization of this theorem to the continuous 
case does not seem to have so far been given in a final form. 
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THE ESTIMATION OF DISPERSION FROM DIFFERENCES 1 

By Anthony P. Morse 2 and Frank E, Grubbs 

Ballistic Research Laboratory, Aberdeen Proving Ground, Maryland 

Summary. The estimation of variance by use of successive differences of 
higher order is discussed in this paper. Heretofore, attention has been focused, 
in published works, on estimates of variance obtained by employing the sum of 
squares of deviations from the mean and also by using mean square successive 
differences of the first order [1], [2], [3], [9]. A concise description of the method 
employing differences of any order with appropriate formulae for the precision 
of estimates so obtained and also a practical example on the use of the technique 
are given in section 11. Fundamental contributions to the estimation of 
variance from higher order differences, a study of the efficiency of the technique 
and proper orientation of the subject matter in the field of mathematical statis¬ 
tics are given in sections 2-10 of the paper 
1. Introduction. It frequently happens that successive observations, made 
at regular intervals of time, arc subject to the same standard error while the 
means of the populations from which they are drawn display some kind of trend. 
The type of trend we speak of is brought about because of the manner in which 
we have to take measurements or because of variations in the measuring tech¬ 
nique itself, or, again, the trend may be characteristic of the thing we are meas¬ 
uring In any event, we may desire to eliminate the trend in order to study 
residual effects. As an example, it is desirable in the field of ballistics to evaluate 
the dispersion of machine guns firing from a moving airplane 
It may also happen that it is either inexpedient or impossible to estimate the 
standard error of the observations by the method of least squares, for in a large 
number of cases the type of trend is unknown. In this event a method employing 
differences of an appropriate order may prove valuable. The method consists 
merely of arranging the data in a vertical column in the order in which the obser¬ 
vations were taken and then forming difference columns m the usual way of 
order 1,2, up to say 5 or some other number depending on the peculiarities of the 
problem at hand and the number of the original observations. Next, sum the 
squares of the numbers in each column and divide the sum of squares of the pth 

order differences by (■n - p) . When n> 2 and p > 1, the numbers thus 

arrived at are all unbiased estimates of the population variance a for the case 
where all the observations have the same expected value. In section 11 at the 

l Thia paper is based substantially on a Ballistic Research Laboratory Report [10] 
of the same subject by Morse and has been prepared for publication by Grubbs at the sug¬ 
gestion of R H. Kent The authors are grateful to J V Lewis and H. L. Meyer for their 
many and varied comments, criticisms and suggestions 

2 Now at the University of California, Berkeley, California. 
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end of the paper will be found a summary of this method, formulas by which 
the precision of the estimate of the variance a may be determined, and an exam¬ 
ple displaying the stability of this estimate with respect to p. 

If a strong trend is present then the method of first differences will obviously 
yield an estimate of variance which is fictitiously large and the temptation to 
pass to higher order differences may quite reasonably be yielded to As a matter 
of fact, unbiased estimates may be hoped for from pth order differences whenever 
there is good reason to suppose that the pth derivative of the trend function is 
small most of the time. However, even in the case of a sinusoidal trend where 
all derivatives have the same magnitude one may obtain good results from.higher 
differences provided there are at least seven observations in each interval of 
length one period (see section 5 and Table II below). In connection with trends 
such as the sinusoidal type, the hopelessness of getting, say, even a fifth degree 
polynomial to fit over an interval of, say 20 periods is rather evident. It is 
for the above reasons that estimation of variance from higher order differences 
deserves consideration. 

2. Historical comment. A brief historical development of the interest in 
successive differences as a means for estimating dispersion is given in [3]. This 
paper discusses the statistic 



suggested by “Student” [W. S. Gossett] and E. S. Pearson and points out the 
relevant work of Jordan, Helmert, Vallier, Cranz, and Becker. It seems that 
Jordan devised methods based on sums of powers of the differences, whereas 
Helmert gave more careful consideration to the case of the first power, i e. the 
sum of absolute differences. Reference [3] points out, however, that in these 
two cases all the n(n — l)/2 differences that can be established from a sample of 
n observations were included in the estimates of dispersion recommended by 
Jordan and Helmert, so that the estimate was of no value in reducing the effect 
of a trend Continuing the remarks of [3], we learn that in ballistics Vallier 
appears to have been the first to estimate dispersion from successive differences 
and that Cranz and Becker commended the mean successive difference 


71—1 

F ~ | 
~ t-1_ 

n — 1 


in estimating dispersion in range of guns since they were aware of variable ex¬ 
ternal effects (such as tail winds) on a projectile. In this country, Bennett [1] 
appears to have suggested the use of successive differences independently of 
European ballisticians. In this connection, Bennett suggested that the probable 
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TABLE 1 


The Efficiency, W(n, p), of S 2 n , p As An Estimate of a 2 


\, 

\ 

\ 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

2 

l. 00000 










3 

.80000 

50000 









4 

75000 

.46154 

.33333 








5 

72727 

46552 

.32000 

.25000 







6 

71429 

47213 

33149 

24427 

20000 






7 

70588 

.47771 

34453 

.25510 

19672 

.16667 





8 

70000 

48214 

.35537 

.26871 

20633 

.16471 

14286 




9 

60565 

.48568 

.36408 

28071 

21888 

17274 

14159 

,12500 



10 

69231 

.48855 

,37113 

20071 

23068 

18385 

14830 

.12414 

.mu 


11 

,68966 

49091 

.37691 

29904 

24070 

19476 

.15802 

.12078 

11050 

.10000 

12 

68750 

49288 

38173 

30602 

.24934 

20450 

16798 

13827 

11629 

09955 

13 

68571 

49455 

.38580 

.31194 

.25672 

21300 

.17714 

,14729 

12271 

.10366 

14 

68421 

49698 

38928 

31701 

.26308 

.22039 

18530 

.15581 

13086 

11018 

16 

68293 

49722 

,39228 

32139 

.26859 

.22684 

10260 

.16353 

.13874 

11754 

16 

68182 

49831 

39400 

32522 

27342 

.23251 

.19887 

17045 

14601 

.12481 

17 

68085 

.49926 

39721 

32859 

,27767 

23752 

20462 

.17664 

15200 

.13162 

18 

.68000 

50011 

39925 

33158 

.28145 

.24197 

.20056 

.18218 

. 16855 

.13787 

19 

67925 

50087 

.40107 

33424 

28482 

24695 

.21407 

.18715 

16393 

.14356 

20 

67857 

.50155 

.40271 

33063 

28784 

.24953 

.21813 

19164 

. 16870 

.14875 

21 

67797 

.50216 

.40419 

33880 

29058 

.25276 

22181 

.19571 

17321 

.16347 

22 

67742 

.50272 

40553 

34075 

29306 

.25569 

22515 

.19941 

,17723 

15778 

23 

.67692 

50323 

.40676 

34254 

29532 

26837 

.22819 

.20279 

18091 

,16173 

24 

.67647 

50370 

40787 

34417 

.29739 

,26082 

23098 

.20588 

.18428 

16635 

26 

.67606 

50413 

.40889 

34567 

.29929 

.26307 

23354 

,20873 

.18738 

10809 

26 

.67568 

.50452 

.40984 

.34706 

.30104 

.26514 

23590 

.21135 

.19024 

17177 

27 

.67533 

.50489 

.41071 

34833 

30266 

26705 

23809 

,21378 

19289 

17463 

28 

67500 

.50523 

.41152 

34951 

.30416 

.26884 

.24012 

.21603 

19535 

.17728 

29 

67470 

50555 

41228 

.35062 

.30555 

.27049 

24200 

21812 

19764 

,17975 

30 

67442 

50585 

.41298 

.35165 

30686 

27203 

.24375 

22007 

19978 

.18205 

31 

67416 

50612 

41363 

36260 

30807 

27347 

24639 

22190 

20177 

18420 

32 

67391 

50638 

.41425 

35350 

,30921 

.27482 

.24693 

.22361 

.20364 

18622 

33 

67368 

.50662 

.41482 

35434 

.31027 

.27608 

24837 

.22521 

.20539 

.18811 

34 

67347 

50685 

.41636 

.35513 

,31128 

.27727 

.24973 

,22072 

.20704 

18980 

35 

.67327 

60707 

41587 

35588 

31222 

,27839 

.25101 

.22814 

.20859 

10167 

.36 

67308 

.60727 

.41635 

.36658 

.31312 

.27045 

.25221 

.22949 

,21006 

10315 

37 

67290 

.50746 

.41671 

.35724 

.31396 

.28045 

.25335 

23075 

.21146 

.19465 

38 

67273 

50764 

.41724 

.35787 

31476 

28140 

,25443 

.23195 

.21276 

19606 

39 

67257 

50781 

41766 

35847 

31551 

.28229 

.25546 

,23309 

.21401 

19741 

40 

.67241 

.50797 

.41804 

.35904 

31623 

28314 

.25642 

.23417 

21519 

.19868 
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TABLE I —Continued 


\ 

\ 
n X 

\ 

1 

2 

3 

4 

s 

6 

1 

s 

9 

10 

42 

67213 

50828 

41876 

36009 

.31756 

28472 

.25822 

23617 

21738 

.20105 

44 

.67188 

50855 

41941 

,36104 

31877 

.28615 

25986 

23799 

.21937 

20320 

46 

67164 

.50880 

42000 

.36191 

.31987 

.28745 

26135 

23966 

22118 

20616 

48 

.67143 

.60903 

42055 

36271 

32088 

28865 

.26271 

24117 

.22284 

20695 

50 

.67123 

60925 

42105 

.36343 

32180 

.28975 

26397 

24256 

22437 

20860 

52 

67105 

.60944 

,42151 

36411 

32266 

29076 

26512 

24385 

22578 

.21012 

54 

67089 

50962 

42193 

36473 

32345 

29170 

.26619 

.24504 

22708 

21163 

66 

.67073 

50979 

42233 

.36531 

32418 

29257 

26718 

24614 

22829 

21284 

68 

67069 

50995 

42270 

36585 

32487 

29338 

26811 

24717 

22941 

.21405 

62 

67033 

.51022 

.42337 

36682 

32609 

29484 

.26977 

24903 

23144 

.21624 

66 

67010 

.51048 

.42395 

.36767 

32718 

29612 

27123 

.25066 

,23322 

.21817 

70 

.66990 

.51069 

42447 

.36843 

.32813 

.29726 

.27252 

.25209 

.23479 

21987 

74 

.66972 

.51089 

42492 

.36910 

32898 

29826 

27368 

.25237 

.23619 

22138 

78 

66967 

,51107 

42534 

.36970 

32975 

29917 

.27471 

25452 

.23745 

.22274 

82 

.66942 

51122 

42571 

37024 

.33043 

29998 

27564 

25556 

23859 

22397 

90 

.66917 

.51160 

42636 

37118 

33162 

.30139 

27725 

26735 

24056 

22609 

98 

66897 

.51172 

42689 

.37197 

33262 

30257 

27860 

.25885 

24219 

22786 

106 

.66879 

,51192 

42735 

37263 

33346 

.30357 

27974 

26012 

.24358 

.22936 

114 

.66864 

61208 

42774 

37321 

33418 

30443 

.28071 

.26121 

24477 

.23065 

122 

.66851 

,51223 

42808 

.37370 

.33482 

.30518 

28156 

26216 

24681 

23177 

138 

66829 

61247 

.42864 

.37452 

.33585 

30641 

28297 

26372 

.24762 

.23362 

164 

.66812 

.61266 

42909 

.37517 

.33667 

30738 

.28408 

26496 

.24887 

.23508 

170 

.66798 

.51281 

.42944 

,37570 

33734 

.30817 

28498 

26596 

.24997 

.23627 

202 

.66777 

51304 

43000 

.37649 

33836 

30937 

.28635 

26749 

26164 

23808 

234 

,66762 

.51322 

43040 

37708 

33909 

31025 

.28735 

26860 

.25285 

23939 

266 

.66751 

.51336 

43070 

.37752 

33965 

31091 

28810 

.26944 

.26377 

24038 

330 

66734 

.51353 

43112 

37814 

34044 

31185 

28917 

27063 

.25508 

24179 

394 

66723 

51365 

.43141 

.37856 

.34097 

31248 

28990 

.27143 

25596 

.24274 

622 

.66709 

51381 

43178 

.37910 

34164 

.31327 

.29081 

27244 

25707 

.24394 

778 

66696 

51396 

43216 

.37963 

,34233 

.31408 

29173 

.27347 

.25819 

24616 

1290 

.66684 

.51409 

43245 

.38007 

34288 

,31474 

.29248 

27430 

25910 

24613 

2314 

66676 

.51418 

.43264 

38036 

,34326 

.31518 

29298 

.27486 

.25971 

.24680 

CD 

.66667 

61429 

.43290 

38073 

.34372 

31573 

29361 

27556 

.26048 

24763 
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error should be estimated from the root mean square successive differences as 
follows: 


P.E = .6745 



2 (n - 1) 


In 1940, J. von Neumann and R. H Kent in [2] investigated further the estima¬ 
tion of probable error from mean square successive differences (sums of squares 
of first differences). J. von Neumann, R. H. Kent, H. R Beilinson, and B. I. 
Hart [3] considered the distribution of 

S 2 = —^ £ (®.+i - £.) 2 

n — 1 1=1 


in a paper which appeared in June 1941. J. D. Williams [4] obtained the 
S 2 

moments of n = where 

5“ 


l 2 (z, - x)\ 


n ,_r 


and indicated that the rth moment of ij is equal to the rth moment of S 2 divided 
by the rth moment of s 2 . The distribution of the ratio of the mean square 
successive difference to the variance has been published by J. von Neumann 
[5], [6] and B. I. Hart tabulated the probability integral and obtained percentage 
points for this statistic ([7], [8]). Indeed, it should be remarked that the statis¬ 
tical theory of successive differences is allied with the problem of serial correla¬ 
tion [9]. Finally, the use of squared differences of higher order than the first for 
estimating variance appears to have been suggested by A. A. Bennett. Quite 
independently, a treatment of the subject was given by Morse [10] in connection 
with problems on exterior ballistics. Various results on successive-difference 
estimation including significance tests have been given by Tintner [13]. One of 
Tintner’s tests involves the use of selected sets of differences 
3. Definitions and notations. Suppose the observations Xi , x 2 , x s , • • • x n 
are made at times a = U < U < fa <•• • < l n — b and the t , are uniformly spaced 
without error Let /(/,) be the true trend so that tj, = /(<,) is the mean of the 
population from which x, is drawn and e, = x, — ij, is a random error. Further, 
let p be a non-negative integer less than n and denote to the rth backward differ¬ 
ence of order p of x by A 1 #,, i e. 


= A p V - A p V-i = 2 (-1/ ( pN ) Z.-r , 

rnO V / 

where 
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We define the following: 


( 1 ) 

( 2 ) 

(3) 

(4) 


in., = 


(?) <« " P> 


j2 _ 

dn t p 


2 

v »iP — 


&n,p — 


(?) ■ 


Z (A^,) 2 ; 


= J3+1 


/ 27 a S (A^O 2 ; 

(p) <» - p) ‘" p+1 


(?) ( * - *■> * 


E (A\)(A\). 


* J >+1 


By E(u) we will mean the expected value of u, whereas the variance of u will 
be denoted by 

Var (u) = E{u — E(u)} 2 . 

Basically, we shall assume that the t. are sufficiently Gaussian and inde¬ 
pendent that 

E(e t ) = E(e\) = 0, E(A) = a 2 , 

Mi = E(A) = 3* 4 , 

£(«?«?) = VEtf), 


whenever i, j, a and p are positive integers for which 

t 1 < i < n, 1 < J < n. 

4. Expected values. We will now determine the mean or expected values 
of &n,p and d*n,p • 



or 

(5) 8(0 = <r ! - 

(see Lemma 1.3 of section 6 below), 
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Continuing, we have 


E(d \,,,) 
E(d\ P ) 


(*)<■"*> U+1 


E{ E (A’«, + AV)7, 


(?) (n - P> 


(n - p) 


( 2? V + t 

\P/ t-P+i 


(A'i«)^, 


or 

(6) E(d\, p ) = a + v 2 „,p . 

Consequently, we observe, d\, p is on the average larger than a- 2 by the quantity 
v\, v . In a particular problem, therefore, we are faced with the situation of 
choosing that combination of n and p which (i) regulates the size of v\ tP and (ii) 
gives the desired precision of our estimate of variance. 

5. The magnitude of v\ :P . In order to study the size of v\ tV , we will derive 
for this quantity an upper bound which will indicate the applicability of the 
method of differences to non-polynomial as well as polynomial trends. 

Now, 

A”i), = A p /(0 = [ / ■ • • [ f (p> (yi - Vi - ■ • ■ - y P ) dy P dyv-x dyx , 

J 0 Jo 

where f r — t r ~i = h, by straightforward integration. It will be convenient to 
change the order of integration; thus 

A7«.) = / • • • / [ f (v \yi - Vi - • • • - y P ) dyx dy„ • • • dy 2 . 

Since, from Schwarz’s inequality it is clear that 

{/_ g(s) ds \ <(fi - a) J {g(s)l 2 ds 

whenever a and /3 are real numbers and g is integrable, we have 

{A ”?;,) 2 < h v J ■■■ j J {f M (yi — Vi — • • ■ - y P )} 2 dyx dy p dyt. 

•'v •'0 

Also, 

E {A p ij ,} 2 < h” J J f (f Cp) (yx ~ y 2 - ■ • • - y P )} 2 dyx dy p ••• dy t . 

1-jj+l Jo Jo Jt p 

But for 0 < r < (p — l)h = t p — a we have 

l lf M (yi ~r)} 2 dyx= f" r da < f U M (s)} 2 ds , 

‘p •'Ip—r Ja 
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Consequently 

± (A%,) 2 <h p f ■■■ f h f (/ c?) (s)! 2 ds dy p ■ ■ ■ dy 2 = f f/ (p) (s)) 2 ds. 
J 0 Jo Ja J a 


Since h = 


b — a 
n — 1 


we have finally 


(7) 



b — a 


which is an upper bound for v 2 „, s in terms of the average value of the square of 
the pth derivative of the trend function,/. 

If the trend function / is of the polynomial form, 1 


fit) = Y,a r f 

r-0 


then the effect of the trend can be eliminated from our observations by estimating 
dispersion from ip + l)st differences. However, if it is known that the trend is 
of polynomial form, then an estimate of dispersion based on least squares would, 
of course, be better. In fact, it will be shown later that the precision of S 2 n , p 
decreases markedly as p increases. The use of <f n , p as an estimate of a is pri¬ 
marily of value when the type of trend is unknown, however, even when the type 
of trend is known the computational simplicity of d\ may offset to some extent 
its lack of optimum precision. 

Let us reflect on the magnitude of v\ tP over a single period of a sinusoidal trend, 
say f{t) = sin f. In (7) we set a = 0, b = 27r and secure 


2 

V n,p 


< 


(?)<» - ”> 



Taking n to be the number of observations for a complete period, a tabulation of 
the upper bound for v\, v for this case is given in Table II. Thus, when there 
are about seven or more observations in each interval of length one period, esti¬ 
mation of dispersion from higher order differences may prove of considerable 
value even for this rather extreme type of trend. 

6. Some combinatorial relations. Although we will ultimately establish 
expressions for the variances of S 2 n , v and d\, p , it appears desirable to give first a 
number of combinatorial relations which present themselves in the computation 
of moments. The relations are easily checked and most of them are possibly 
well known. Nevertheless, it will be convenient to record them for reference 
and in some instances to give proofs. In what follows it will be understood that 


— 0 whenever p and q are not such integers that 0 < q < p. 
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TABLE II 


\ 

'\ 

\ 

s 

6 

7 

8 

9 

10 

1 

617 

395 

274 

201 

154 

110 

2 

676 

.260 

120 

063 

.036 

.016 

3 

.751 

164 

049 

.018 

008 

.002 

4 

106 

111 

021 

005 

002 

0003 

5 

— 

098 

000 

.002 

.0004 

0000 


Lemma 1.1. 2 ( P ) = p(*_ }). 

Lemma 1.2. ^ P r V 

Lemma 1.3. E (*) (, * ,) - ( p % ,) . 


Proof: 


t s = (l + x)»- {(H-xr } 2 = tef P 


= IE 


p ~y 


Hence 


and 


(?) = ?©(-,)• 


U.)-?(!)(, 


o*-©-('7‘) + C'-9- 

“ O-e:!)}- 


VP + s - r 

Lemma 1 4. 7/ p 3 + r 2 > 

Lemma 15. (p — 2r) (^j = p|(J* ~ 

Lemma 1.6 (p - 2r) ( P J = p{( P “ l J ~ ( p “ . 

Proof: Multiply, using 1.4 and 1.5. 

Lemma 1.7.' r(*. J . „{(* “ ^ J}. 


’Major A, A Bennett communicated this Lemma. 
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Proof: (s - 20 = S {C /) “ (f - !)} from L6 - 

Put s = 2 p, t = p — r, then 

2r (p + r) ’ ’ 2P {(?- r)’ “ (v-r- l)}- 
Lemma 18 If f is a function, i, n, p are integers and p + 1 < i < n, then 


sG *>-->=£(?) 


/O’). 


Proof: 




Lemma 1.9 If — & < A(r, s) = A(s, r) < °o for each integer r and s, then 
E Z] A(r, s)t r ^ = (/J4 — 3jt 4 ) X A(r, rf 


+ o’ 4 jZJ A(r, r) [> +2<r 4 Z X A(r, s) 2 . 


Proof: Let N(r, s) — 1 when r < s and let N(r, s') — 0 otherwise. Clearly 
Z Z ^Oh s)«r«. = Z A O’, r)e 2 + 2 X) X) N(r, s)A (r, s)e r e s , 


r~1 


r=l 


and 


B ({£ t Mr, «)•,«.}*) - B ({£ A(r, r),:}') 

+ 4E (jz Z NO’, s)A(r, . 

\ i,r=l »=1 J / 


Now 


and 


E ^jZ A{r, r)s 2 j ^ = (/** — <r 4 ) 2 A(r, r) 2 + c 4 jZ 4(r, r)| , 
4e((z Z iV(r, s)A(r, s)« r *.}') 

\ ^rtml tf«wl ) / 

= 4/££ lV(r, s)A(r, s) 2 


=1 fl -1 


= 2<r 4 Z Z A(r, s) 2 — 2er 4 Z A(r, r) 2 

r=l final Taail 
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The last three relations combine to yield the desiied result. 

Lemma 1.10 ( n ~ 

+-‘§ s It G--)(<- 

Proof: Helped by 1.8, check that 


V 
— r 


{§<-’>'(?)-Ms ^ 

AH 

st-’K-OC-.)•■••■ 



Therefore 


Let 


(?) <- - »>'- - £5 ,t 0-X - 


tr «« 


and apply 1.9 to conaplete the proof. 
Lemma 1.11 


SlLtC-X- 

- (n - p) X. G +)' - 2p Cv 7 +2p l 2 "»7 ■ 


Proof. 


§sLtC^)C- 

■^.,tSSC:)G + 5- < )G-r)C-.)-'*'“ i 
- t ,4 £ § (?) C +’,-<) (?) C + J- <)• “"■* 
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-&&?C)Cr + 5-<)?(?)C + 5-.) 

■ & {? (?) C+? -<)}’ - & ,t C 4 - ; )'*» «■ 

-X ( ”- p - |ri) C¥ r y' 

- S’ <»-»- I ’■I>(p + r 


fwj- n 


=<»->xc¥j-i>{(¥: r i y-cx i 1 y}.^- ; 


= (n — p) 


.1 


SAX-.y-^O- 


Lemma 1.12. 


Proof. 


Z £ (. p Y- £ til')'- £ £(jY. from 1.8; 

r—1 ji+ 1 V V p+1 r—1 V V ’—P+1 r — 0 V / 

= (n - p) Z (f) = (» - P) ^ 2? 


7. The variances of 5 2 „ iP and d 2 „ , p . In order to get some idea as to the efficiency 
of the statistics S 2 „ lP and d\, v , we will examine their variances. We have 



(n - p) 2 Var (6 2 „, P ) 



(n - v? wo - msi, P )f\ 



p)V+i! (»- P )^f (,*„)■ 


-^•(vy 

+ ^ » 0 
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with the aid of Lemmas 1 10, 1.11, 1 12 and using the relation ^ — So- 1 = 0. 
Thus, 


( 8 ) 


(T) 


If 2 p < n, then 


Moreover 
Therefore, 
(9) 


y^ J / 2p Vy( 2 p V _ V ( 2 P\ = (*P\ 

rSL. \P + r) + r) , \ fj \2p/ ’ 

■CV)-* 


(p 5 ) ^ _ p ) 2 Var = 2( v ~ p) (^j)"' 4 ~ 4 p( 2p p X ) 

when 2p < n. 

As for the variance of d\, P , we have 

Yar (O = E[d\, p - r 2 „,p - <r 2 ) 2 = E{&\. p + K P + v\, P - v\, p - a 2 } 2 
= E{&l, p -S) + k n , P }\ 


or 

(10) Var ( d\, p ) = Var (5„,„) 2 + E(k 2 n , p ), 
since E[(&\ :V — <t 2 )/c„,J = 0. 

However, from Schwarz’s inequality, it is guaranteed that 

E(k\, p ) < iv\, p <T 2 . 

Thus 

(11) Var d\, p < Var (8l, p ) + 4v\y. 

An upper bound has already been given for v\,„ in section 5 above. 

8. The efficiency of S 2 „ , P . It is appropriate to consider the efficiency (as 
defined by Fisher [11]) of the statistic d \, p . In this sense, the efficiency of 
ol.p is given by 


W(n, p) 


Var s\ 
Var b\ p 


where s\ 


'll (*, — xf 


n — 1 
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Accordingly, 


W(n, p) 


2c* 

(n - 1) Var (O 


or 


W(n, p) 


( 12 ) 



{n - 

O 

2 

I 

(* - 1) \ 


| 2 - 2 P { 

'2 V - 1> 

\ V ) 

iMV)l 


If 2 p <n 

(13) W(n p) = 


(n 




in 


or 


(14) W(n, p) = 


- 1) {(»-*)(£)-Up >■)} 
/2PV 

\P / 


from (9); 


wl -’<( ,-,)(«)' 


if 2p < n. 


Formulas (12) and (13) were used in preparing Table I given at the end of the 
paper. For convenience in using formulas (1) and (2) the binomial coefficients 

(^j for 0 < p < 10 are given in Table III. 


If n > 2, then 


(15) 


Win, 1) = 


1 - 


2(n - 1) 
Zn — 4 ’ 


3 n — 3 


as was pointed out by von Neumann, Kent, Beilinson, and Hart in [3]. 
If n > 4, then 


10 

(16) W(n ’ 2) " 35 


18 (n - 2)- 


1 + 


n 




18 


35 (n - 2) 


(n - l)(35n - 88)' 
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As a limiting value for n, we have 


(17) 


W(°° 


p) = Lim W(n, p) = 

rt—*oo 



Using Stirling’s formula for the approximation to the factorial, we have 


Lim ■s/vW(*>, p) - A/-* 

p —*cg f 7T 


Thus, as p —> oo, W( °°, p) tends to zero and is asymptotically equal to 



TABLE III 

The Binomial Coefficient I 


p 

(V) 

0 

1 

1 

2 

2 

6 

3 

20 

4 

70 

5 

252 

a 

924 

7 

3432 

8 

12870 

0 

48620 

10 

184766 


For the case n > 2, p > 1 and / constant, then s\ = —~-— and „ 

n — 1 

and d niP are all unbiased estimates of the population variance a. Moreover, 
for this case 


W(njP ) = lar^) 
Var (O 


Var (s 2 „) 
Var (O ' 


Using si, based on m — 1 degrees of freedom and keeping the trend, /, con¬ 
stant, then m and n may be chosen so that approximately 

Var (si,) = Var (d\, p ) 

and for a normal population this means that 


m = l 4- (n — l)W{n, p). 
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Using Table I, it may be seen that for constant trend, /, the worth of djo.io as 
an estimate of u for a normal population is about the same as that of sn , whereas 
that of din, i is about equivalent to s 2 o However, if the trend / is not constant 
then the worth of s 2 „ as an estimate of n 2 is diminished while that of d\, p is 
increased. 

Similarly, if the trend is cubic over 20 observations then least squares gives 
an unbiased estimate of <x 2 based on 16 degrees of freedom, whereas dl 0,4 gives an 
estimate equivalent in precision to about 6.4 degrees of freedom. However, if 
only eight observations follow a cubic trend, then least squares furnish an un¬ 
biased estimate of <r“ based on four degrees of freedom whereas d\ ,4 furnishes an 
estimate equivalent to about 1.9 degrees of freedom. Thus, in the case of 20 
observations, cubic least squares is, so to speak, 2.5 times as valuable as dl 0 , 4 ; 
in the case of eight observations, cubic least squares is 2.1 times as valuable 
as dg. 4 . 

It might be mentioned that the method of differences is of value in estimating 
goodness of fit. If the fit is good, then our estimate of <r 2 derived from least 
squares should on the average be equal to the estimate derived from a suitable. 
d\. v . If the fit is poor then d\, p will be smaller on the average than the former. 

9. The approximate probable error in estimating a from differences. The 
approximate standard error of S n , p is given by the relation 


S.E. (5n,j>) 


1 S.E. (O = * 

2 a s/2(n — l)W(n, p)' 


If p has been so chosen that v\ tV is suitably small then [see equation (11)] 
some confidence may be put in the approximate formulas: 


(18) 


S.E (d BlP ) 


€T 

V2(n ~l)WWp) 


(19) 


P.E. (d n , p ) 


■6745cr 

V2 (» — l)W(n, p)' 


Formula (19) was used in preparing Table IV which gives the approximate 
probable error to be feared m using d n , T as an estimate of <r. This table should 
yield interesting information whenever p has been chosen so that d\ is a suitably 
unbiased estimate of a. 


10. Remarks. We have presented a useful technique for estimating variance 
from higher order differences and have given the precision of our estimate. The 
method of estimating variance from higher order differences appears to be quite 
valuable in cases where the type of trend in our observations is unknown. A 
considerable field of work remains concerning a complete investigation of the 
distribution and other properties of the statistic d\, P . In this connection, 

Baer [12] has already published a study on the stochastic limit of-- d\, 1 . 

It is hoped that others will contribute to the problem of estimating dispersion 
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TABLE IV 


The Probable Error In Estimating a From Differences* 


\ 

B 

1 

2 

3 

4 

5 

B 

7 

8 

9 

to 

1 

.4769 











2 

.3373 

4769 










3 

.2753 

.3771 

4769 









4 

.2384 

3180 

4054 

4769 








5 

2133 

,2796 

3495 

.4215 

4769 







6 

.1948 

.2524 

3104 

3704 

4404 

4769 






7 

1803 

2317 

2817 

3318 

3855 

4390 

4769 





8 

.1686 

2154 

2596 

3024 

3477 

3969 

.4442 

4769 




9 

1589 

2022 

2420 

2794 

,3183 

3604 

4057 

4481 

4769 



10 

.1608 

1911 

.2274 

2610 

2948 

3311 

,3708 

4128 

4513 

.4769 


11 

1438 

.1816 

2153 

2457 

2758 

3074 

.3417 

3794 

4186 

4537 

4769 

12 

1376 

.1734 

.2048 

2328 

2599 

2880 

.3180 

3508 

3867 

.4234 

.4558 

13 

1323 

.1663 

1958 

2217 

2465 

2717 

2983 

.3272 

3587 

3930 

4276 

14 

1274 

.1599 

.1878 

.2120 

,2350 

.2579 

2818 

3073 

.3351 

3666 

3984 

15 

1231 

1542 

.1808 

2035 

.2248 

2459 

2677 

.2905 

3152 

.3423 

3718 

16 

1192 

.1491 

1744 

1960 

2159 

2355 

2554 

2761 

.2983 

3223 

.3485 

17 

.1156 

.1445 

1687 

1892 

2080 

2262 

2447 

.2637 

2837 

3052 

3286 

18 

1124 

1403 

1636 

1831 

.2009 

2180 

2352 

2527 

2710 

2905 

3116 

19 

.1094 

1364 

.1589 

1775 

1945 

.2106 

.2267 

2430 

.2699 

2777 

2967 

20 

1066 

.1328 

.1545 

1724 

.1886 

2040 

.2191 

2343 

2500 

2663 

2837 

21 

.1040 

.1295 

1505 

1677 

1832 

1978 

.2121 

2264 

2411 

.2562 

2722 

22 

1016 

1265 

1468 

.1634 

.1783 

1922 

.2058 

2193 

2331 

2472 

2620 

23 

.0994 

1236 

1433 

1594 

1738 

.1871 

2000 

.2129 

2258 

2391 

.2529 

24 

0973 

1209 

1401 

1557 

1695 

.1824 

1948 

2069 

2191 

2316 

2446 

25 

.0954 

1184 

1371 

1522 

1656 

.1779 

.1898 

2015 

2131 

2249 

.2370 

26 

.0935 

1160 

1343 

1490 

1619 

1739 

1853 

.1964 

.2075 

2187 

2301 

27 

.0918 

.1138 

1316 

1459 

.1585 

1700 

1810 

1917 

2023 

2130 

2238 

28 

0902 

.1117 

,1291 

1431 

.1553 

1664 

1770 

1873 

.1975 

2077 

2180 

29 

.0885 

1097 

1268 

1404 

1522 

1631 

1733 

1832 

1930 

2028 

2126 

30 

.0871 

1078 

1245 

1378 

1493 

1599 

1698 

1794 

1888 

1981 

2076 

31 

0857 

.1060 

,1224 

1354 

1466 

.1569 

.1665 

1758 

1848 

.1938 

2029 

32 

0843 

.1043 

1204 

.1331 

1441 

1540 

1634 

1724 

l&ll 

.1898 

1985 

33 

0831 

1027 

1184 

1309 

.1416 

1514 

1605 

,1692 

.1776 

.1860 

1944 

34 

0818 

1012 

1166 

.1288 

1393 

1488 

1577 

1661 

1744 

.1825 

.1905 

35 

.0807 

.0999 

1149 

.1268 

.1371 

.1464 

,1650 

.1632 

1713 

1791 

1869 

36 

0795 

.0983 

1132 

1249 

1350 

1441 

.1525 

.1605 

.1683 

,1759 

1834 

37 

,0784 

0969 

.1116 

1231 

1330 

1418 

1501 

1579 

1655 

.1729 

.1802 

38 

0774 

0956 

1101 

.1214 

.1311 

1397 

.1478 

1555 

1628 

.1700 

1771 

39 

.0764 

0943 

.1086 

1197 

.1292 

1377 

1456 

,1531 

1603 

.1673 

1741 

40 

0754 

0931 

1072 

.1181 

1274 

,1358 

1435 

.1508 

1578 

1646 

1713 
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TABLE IV— Continued, 


\ 

■\ 

0 

1 

2 

3 

4 

5 

0 

7 

8 

9 

10 

42 

0736 

0909 

.1045 

1151 

1241 

1322 

.1396 

1466 

1533 

1507 

1661 

44 

.0719 

0887 

.1020 

1123 

1211 

.1288 

1360 

1427 

1491 

.1553 

.1613 

46 

,0703 

.0868 

0097 

1097 

1182 

1257 

1326 

.1391 

,1453 

1512 

1570 

48 

.0689 

0849 

,0975 

1073 

1165 

.1228 

1295 

1357 

1417 

1474 

.1529 

50 

0675 

0832 

.0955 

1050 

1130 

1201 

1266 

1326 

.1383 

.1438 

.1492 

52 

0661 

,0815 

0936 

1020 

.1107 

1176 

.1238 

1297 

1352 

1405 

1457 

54 

0649 

.0800 

0918 

,1009 

1085 

1152 

1213 

1270 

.1323 

.1375 

.1425 

56 

0637 

0785 

.0901 

.0990 

.1064 

1129 

.1189 

1244 

1296 

1346 

,1394 

68 

0626 

0771 

0885 

0972 

1045 

1108 

1166 

1220 

1271 

.1310 

1366 

62 

.0606 

0746 

.0855 

0939 

.1008 

1060 

1125 

1176 

,1224 

1270 

1313 

66 

0587 

0723 

0828 

0909 

0975 

,1034 

1087 

,1136 

1182 

.1225 

,1266 

70 

0570 

0702 

0804 

.0881 

0946 

.1002 

.1053 

1100 

1144 

1185 

1224 

74 

0554 

0682 

0781 

0856 

.0919 

.0973 

.1022 

1067 

1109 

.1149 

1186 

78 

.0540 

0664 

0760 

0833 

0894 

0047 

,0994 

1037 

1077 

1115 

.1152 

82 

.0527 

0648 

0741 

.0812 

0871 

.0022 

0068 

1000 

1048 

1085 

1120 

90 

0503 

0618 

0707 

0774 

0830 

0878 

.0921 

.0960 

0997 

1031 

.1063 

98 

0482 

0502 

.0677 

0741 

0794 

0840 

0880 

.0017 

0952 

,0984 

1014 

106 

0463 

0569 

.0650 

,0712 

0762 

0806 

0845 

0880 

0913 

0943 

0972 

114 

0447 

0549 

,0627 

0686 

0734 

.0776 

0813 

0847 

0878 

.0907 

0934 

122 

0432 

0530 

0600 

0663 

0700 

.0749 

0785 

0817 

0847 

0875 

.0900 

138 

0406 

0498 

.0569 

0622 

0666 

0703 

0736 

.0766 

0704 

0819 

0843 

154 

0384 

0472 

0538 

0589 

.0630 

0664 

0695 

0723 

0749 

0773 

0795 

170 

0366 

0449 

0512 

0560 

.0599 

0632 

0661 

0687 

0711 

,0734 

.0755 

202 

0336 

.0412 

.0470 

0513 

0548 

.0578 

0606 

0620 

0650 

0671 

0689 

234 

0312 

0382 

0436 

0476 

0509 

0537 

0561 

0583 

0603 

0621 

.0639 

266 

0292 

0359 

.0409 

0446 

0477 

0503 

,0525 

0546 

.0565 

0582 

.0598 

330 

0262 

0322 

0367 

,0400 

0428 

0451 

0471 

0489 

.0505 

0521 

0535 

394 

.0240 

0295 

0336 

.0366 

0391 

0412 

0430 

0447 

0462 

0475 

.0488 

522 

0209 

0256 

0292 

.0318 

0339 

,0357 

0373 

0387 

0400 

0412 

.0423 

778 

0171 

0210 

0230 

0260 

0278 

0202 

0305 

.0317 

.0327 

0337 

.0346 

1290 

0133 

.0163, 

0185 

0202 

.0216 

0227 

0237 

,0246 

,0254 

,0261 

0268 

2314 

0099 

0121 

.0138 

0151 

0161 

0169 

0177 

0183 

.0180 

0195 

.0200 


* If dn} p is a sufficiently unbiased estimate of c 2 , then the approximate probable error 
to be feared in using d„, p as an estimate of <r may be obtainod by multiplying the following 
tabular entries by a. 
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when observed data display trends as it is believed that the method of differences 
deserves much attention In particular, it is hoped that someone will have the 
time and ingenuity to calculate the distribution of the statistic 


Were this done, an admirable criterion would be at hand for gauging the signifi¬ 
cance of a change in the estimate of a as we pass from differences of order p to 
those of order p + 1. Of course, useful infoimation m this connection could be 
obtained from a knowledge of the distributions of & 2 „, p and S 2 „, y +i ; m fact their 
variances as herein calculated give us a basis for somewhat reasonable conclu¬ 
sions. An expression for the standard error of the difference between the 
estimates of a from two consecutive series of finite differences is given in 
[13, Chapter VI], 

In connection with testing goodness of fit, it would be valuable also to know 
the distribution of 

Sl, P 

where S\, f is the estimate of variance derived from the least squares fitting of a 
polynomial of degree p. 

For convenience of reference, we conclude the paper with 

11. A concise description of the method and its precision. It frequently 
happens that successive observations made at regular intervals are subject to 
the same standard error a while the means of the populations from which they 
are drawn display a trend We give here a method of estimating the variance a 2 
and of determining the precision of our estimate This method is primarily of 
value when the trend is unknown, however even when the type of trend is known, 
its computational simplicity may make the method advantageous. 


The method. Arrange the data in a vertical column and then m the usual 
way form difference columns of order 1, 2, • • , p Sum the squares of the pth 

order differences and divide by the number (n — p) ■ Our estimate of o- 2 

is the number d 2 „, P , where 


= 7cT\ - £ (a’CtO 5 - 

C;> - 


1 Dixon [9] gives moments of the statistic 


n 

£ (*» - 2s«+i + *, +! ) 8 
i=l _ 

TV 

£ U, - aj,+i) ! 


where x n+i = xi 


Tlld #714-2 — #2* 
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The precision The precision of this estimate may be determined from the 
following information (which has been derived in the present paper). 

E(d\, T ) = a + N.J J 


2 

V n,p 


< 


4_ /b - a\/b- aV'’- 1 f b f/ f],) (s)f ds 
2p\ \n — pj\n — l) i„ b — a 

,v) 


Var (d 2 ;,,) < Yar(5 2 „. P ) +4vl,y, 


Var (S 2 ,,*) 


2ff 4 

(n - 1)TF(», p) ’ 


where TF'Cti, p) is given m Table I. 


TABLE V 


p 

a 

'y 

<r 2 

1 

18 90 

184 62 

11 22 

2 

1 21 

1 88 

10 56 

3 

88 

1 85 

10.30 

4 

87 

1.84 

10.12 

5 

.8fi 

1.83 

10 01 


In case v\, v is sufficiently small (this is determined by the requirements of the 
given problem), then Table IV may be used directly to determine the approxi¬ 
mate probable error in using d n ,„ as an estimate of a 

An example. As a practical example of the use of the method of differences 
when the trend is unknown and of the stability of the statistic d\,? with respect 
to p, we mention a recent problem at Aberdeen Proving Ground which had to do 
with estimating the accuracy with which certain photographic measurements 
locate a moving object. Ballistic Cameras were used to determine horizontal 
x and y, and vertical z coordinates (all in feet) of an airplane traveling about 
160 mph at an elevation of about 35,000 feet An automatic pilot was in use m 
the airplane as it flew over a three mile course. At one second intervals for a 
period of 70 seconds two Ballistic Cameras, 5000 feet apart, were used to locate 
the plane. Since the plane was traveling pretty much in the y direction one 
would expect: that first differences would yield a standard error m y far in excess 
of its true one; that second differences would furnish a much better estimate; 
and that perhaps third differences would yield a still more trustworthy one. No 
matter what order of difference is used we never expect such an estimate to be 
too small. In this problem, the standard errors in x, y, z as estimated from dif¬ 
ferences of certainjorders, p, were as given in Table V. 
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THE EFFICIENCY OF SEQUENTIAL ESTIMATES AND WALD’S 
EQUATION FOR SEQUENTIAL PROCESSES 


By J. Wolfowitz 
Columbia University 


1. Summary. Let n successive independent observations be made on the 
same chance variable whose distribution function f(x, 8) depends on a single 
parameter 0. The number n is a chance variable which depends upon the out¬ 
comes of successive observations; it is precisely defined in the text below. Let 
6*{% i ,■ • , x„) be an estimate of 8 whose bias is b(9). Subject to certain regu¬ 
larity conditions stated below, it is proved that 

/(.*> > (i+iTHC-^OT- 

When /(. t, 8) is the binomial distribution and 6* is unbiased the lower bound 
given hero specializes to one first announced by Girshick [3], obtained under no 
doubt different conditions of regularity. When the chance variable n is a con¬ 
stant the lower bound given above is the same as that obtained in [2], page 480, 
under different conditions of regularity. 1 

Let the parameter 0 consist of l components 0i, • • • , Oi for which there are 
given the respective unbiased estimates 0*(rex, • ■ , x n ), ■■■ , 6* (aj, • • , x„). 
Let || X,, || be the non-singular covariance matrix of the latter, and 11 \ ,J 11 its 
inverse. The concentration ellipsoid in the space of {hi , • ■ , fc,) is defined as 

Zx lJ (/c, - 8 t )(ki - 8,) =1+2. 

(This valuable concept is due to Cramdr) If a unit mass be uniformly dis¬ 
tributed over the concentration ellipsoid, the matrix of its products of inertia 
will coincide with the covariance matrix || X,, ||. In [4] Cramer proves that no 
matter what the unbiased estimates 9* , • • • , 0* , (provided that certain regu¬ 
larity conditions are fulfilled), when n is constant their concentration ellipsoid 
always contains within itself the ellipsoid 

Etti % — 0.X&J — 9,) = l + 2 


where 


Mil 


nE 


d log / d log A 

, d9i ddj ) 


1 To whom this result is to be ascribed is not eleai from the context in which Professor 
Cramer describes it (in [2]). After the present paper was completed the author learned of 
the papers by Rao [8] and Aitken and Silverstone [9], both of which deal with this question 
The author is indebted to Prof M. S Bartlett for drawing his attention to these papers. 
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Consider now the sequential procedure of this paper. Let 0i , • • , 61 be, 

as before, unbiased estimates of 81 , • , 61, respectively, recalling, however, 
that the number n of observations is a chance variable. It is proved that the 
concentration ellipsoid of 8* , ■ ■ , 0* always contains within itself the ellipsoid 

S/Uife — &,) (kj—9,)— l + 2 


where 




When ms a constant this becomes Cramer’s result (under different conditions 
of regularity). 

In section 7 is piesented a number of results related to the equation 
EZ n — EnEX, which is due to Wald [6] and is fundamental for sequential 
analysis 


2. Introduction. Let X be a chance variable whose distribution function 
fix , 8) depends on the parameter 8 It is assumed that X either has a probability 
density function (which we then denote by f(x, 8 )) or that it can take only 
an at most denumerable number of discrete values (in the latter case 
f(x, 9) = P{X = x], where the latter symbol denotes the probability of the 
relation in braces). Let w = , xi , ■ be an infinite sequence of observations 

on X, and let U be the space of "points” o> Let there be given an infinite 
sequence of Borel measurable functions ipi(xi), wix-i , x 2 ), ■ ■ • , <pj(xi , • • • , xf ), ■ ■ ■ 
defined for all to in U , such that each takes only the values zero and one. It is 
well known that the function fix, 8) defines a measure (probability) on a Borel 
field in SI We assume that everywhere in 0, except possibly on a set whose prob¬ 
ability is zero for all 8 under consideration, at least one of the functions <p\, , 
takes the value one. Let n(o) be the smallest integer at which this occurs 
Thus nioi) is a chance variable. 

In statistical applications the chance variable n(«) may be interpreted as a 
rule for terminating a sequence of observations on the chance variable X, the 
probability of termination being one, and the decision to terminate depending 
only upon the observations obtained A sequential test is an example of this 
procedure The converse is, however, not true, because the process described 
above does not require that any statistical decision should be reached when the 
process of drawing observations is terminated 

An “estimate” of 8 is a function 9*ix i, • • , x„) of the observations xi , • • ■ , x„ 
(those obtained prior to the "termination” of the process of drawing observa¬ 
tions) In the sequel we shall limit ourselves to estimates whose second moments 
are finite. The estimate is “unbiased” if Ed*, the expected value of B*, is 8. 
When this is not so Ed* — 8 is called the bias, bid), of 8* In general the bias 
is a function of 8. It is obvious that the function 8* may be undefined on a set 
of points (xi, ■ • , x n ) whose probability is zero for all 8 under consideration. 
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In the present paper we shall be concerned with an upper bound on the effi¬ 
ciency of a sequential estimate, or, more precisely, with a lower bound on its 
variance. This lower bound is intimately related to certain results on the effi¬ 
ciency of the maximum likelihood estimate from a sample of fixed size, This is 
not surprising since fixed-size sampling is a special instance of sequential sam¬ 
pling The results obtained in this paper arc also obviously and intimately 
related to those due to Cramer [4] and those described by him in [2], pp. 477-488. 
Naturally the conditions of regularity (restrictions on f(x , 6 ), 6*, etc) under 
which the results are proved are different For example, no restrictions on the 
sequential sampling procedure need appear in the statement of a theorem which 
deals only with samples of fixed size 

The argument below proceeds as if f(x, 6) were a probability density function 
The results apply equally well to the case where j(x, 9) is the probability function 
of a discrete chance variable provided: 

1) , Integration is replaced by summation wherever this is obviously required. 

2) . The phrase “almost all points” in a Euclidean space of any finite dimen¬ 
sionality is understood 

a) . as all points in the space with the possible exception of a set of Lebesgue 
measure zeio, when f(x, 9) is a probability density function 

b) i as all points in the space with the possible exception of points one of whose 
coordinates is a member of the set Z, when/(a, 8 ) is the probability function of a 
discrete chance variable. The set Z consists of all points z such that/(z, 9) = 0 
identically for all d under consideration. 


3. Conditions of regularity. In this section we shall formulate the restrictions 
which we impose on /, the estimates, and the sequential process. They are 
intended to be such as will be satisfied in most cases of statistical interest. No 
doubt they can be weakened, but the author has decided against attempting to 
do so here. The list may seem long for two reasons. Seldom in the literature 
are the assumptions which, for example, lead to validation of differentiation 
under the integral sign etc., formulated explicitly. The presence of a sequential 
procedure means that additional restrictions must be imposed. 

In this section we assume that 6 is a single parameter. The case where 9 has 
more than one component is treated later. 

(3.1) . The parameter 6 lies in an open interval D of the real line. D may consist 
of the entire line or of an entire half-line. 

(3.2) . The derivative ~ exists for all 6 in D and almost all x. We define 


3 log f(x, e) 


ee 


as zero 


whenever f(x, 9) = 0; thus —is defined for all 6 in D and 

o& 

almost all x. We postulate that E ? ^ = 0 and that E \ ^ 

, dd \ du / 

be not zero for all 6 mD. 
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(3.3). 



d log / (z,, 0) 
30 



exists for all 6 in D. 

(3.4). Let B } , (j = 1, 2, ■ - • ), be the set of points (xx, 
sional Euclidean space such that 


ififxi, - ■ , x t ) = 0 


<p,(xi, ■ ■ ■ , x,) = 1. 


• , x,) in the j-dimen- 


i = 1, 2, 


• . 3 ~ 1 


For any integral j there exists a non-negative L-measurahle function Tfixi , • • ■ , %■) 
such that 


a). 


e*(xi, ■■■ ,Xj)^-Uf{xi,8) 

du a=i 


r T ] (xi j * * * j xf) 


for all 8 mD and almost all (x x , • • ; , xf) in Rj 

b). / T 3 (xi , ■ • ■ , x,) dxi ■ • • dx, 

Jr, 

is finite. 

(3.5). Let 

t,{8) = f 8* (xi, , x,) n/fe> 0) dx ,, (j = 1, 2, 

J R i i=l 

We postulate the uniform convergence of the senes 


dt,(8) 
, de 


d.tj(8) 


(the existence of ' 1 ■ is a consequence of Assumption (3.4)) for all 8 in D. 
do 


4. The case of one parameter. In this section we assume that f(x, 8) depends 
on a single parameter 8. In sections 5 and 6 we shall discuss the case when 8 
is a vector with more than one component. 

We have E 9 ^ X ’ ^ = 0 

30 

by (3.2). Define the chance variable 


r n = E 


i-i 


3 log f(x t , 8) 
38 


By an argument almost identical with that of [1], Theorem 1, or of Theorem 7.1 
below, we have 


( 4 . 1 ) 


EY n = 0 . 
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From Theorem 7.2 below we obtain 

(4.2) AY n ) = EnE 

Let 6* (xi, •• , x n ) be an estimate of 0 such that 

E8* = 6 + b(d). 

Then 


(4.3) 


£ [ 6*i*i , • • • J Xj) II f(x, , 8) dx x = 6 + 5(0). 

1=1 Jit, ,=i 


Differentiation of both members of (4 3) with respect to d (Assumptions (3 4) 
and (3 5)) gives 


(4.4) 


EO'Y. = 1 +f a . 

dd 


From (4 1) it follows that (4 4) gives the covariance between 6 * and Y n . 
from (4 2) 


(4.5) 


a\e*) > 




log/fa0) V I 1 

36 ) . 


Hence 


When the bias 5(0) is constant, for example when 5(0) =0 in case 0* is an 
unbiased estimate, we have from (4.6) 


(4.6) 


<r\e*) > 


EnE 


^ log fix, ^ yj 1 


The equality sign in (4.6) will hold if 0* may be written as Z'id)Y n + Z"(0), 
where Z' and Z" are functions of 0. However, 0* itself should not be a function 
of 0 if our argument is to remain valid The subject is connected with the 
question of the existence of a sufficient estimate. 

Let fix, 0) be defined as follows: 

fix, 0) = 0*(1 - 0) 1_I , (* = 0 or 1 ; 0 < 0 < 1 ). 

Then 

d log fjx, 6) = x _ (1 - x) „ ( d log / V _ 1 

dd 0 (1 - 0) ’ \ 00 J 0(1 - 0) ' 

Suppose 0* is unbiased Then <r 2 (0*) > 0(1 — 0) (An) -1 , a result first given by 
Girshick [3] under unspecified regularity conditions. 

Let the functions <pi, <& , < ■ be such that nice) is a constant. We are then 
dealing with samples of fixed size. The result (4.6) is then given in [2], p. 480, 
under different conditions of regularity. 


5. Regularity conditions for the case when 6 has more than one component. 

We suppose that 0 = (0i, • • , 0i) and that simultaneous estimates 
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0 * (j. . ^ ■ ■ ,e*( x lt - ■ , Xn) of the components of 6 are under discussion 

In the sequel we shall limit ourselves to the case when these estimates are all 


unbiased. 

We postulate the following regularity conditions which are sufficient to validate 

section 6 : . * * . 

(5 1). The covariance matrix of the estimates 8 i ‘ ‘ > 8 i is non-sinQular for all 
6 in D (this time D is an open interval of the l-dimensional parameter space). 

( 5 . 2 ). The conditions of section 3 are satisfied for each 0 , and 0 , {i = 1, • • , l). 


6. The ellipsoid of concentration when 8 has more than one component. Let 

6 = (0i, •• , Oi). 


We shall first describe briefly the result of Cramdr [4] which refers to samples 
of fixed size n > l. Let 8*( a*, • • • , ®») be an unbiased estimate of 
0 , ; (t = ] . . j l). Let || X 13 1| be the non-singular covariance matrix of the 
6*\ and let || X” || be its inverse The “ellipsoid of concentration” m the space 
of points (fci, • • ■ , hi) is defined as 

(6.1) E X”(fc, - 8 t )(lc, - 6,) = l + 2. 

1 , 3=1 

If a unit mass be distributed uniformly over this ellipsoid it will have the point 
(0!, • , 6i) as its center of gravity and \ tj as its product of inertia about the 

corresponding axes. Cram6r proves that, subject to certain regularity condi¬ 
tions, there is a fixed ellipsoid 

(6 2) E #*.»(*< - *•)(*» - 0i) = l + 2 


where 




- nE 


f d log f d log A 

V 3fl. d6, / 


which is always contained entirely within the concentration elhpsoid of any set 
of unbiased estimates The two ellipsoids coincide only under certain condi¬ 
tions, am ong which is that the 8* be jointly sufficient estimates of the 6,. 

Let us now consider the sequential procedure of this paper and postulate the 
regularity conditions of section 5. Let 

K = || M| 


be a matrix with real elements such that | K | = 1 and let 

KT 1 = Ill'll 

be its inverse. Let 


6i 
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e column matrices. Suppose 


S3) 

||*|| = K\\e\\. 

'’hen 

6.4) 

II fl|| = K~ l ||* || 


Define 


**11 = 


fa 


fa 


= K II 0*11. 


From section 4 we have 


(6.5) 


Hn.E 


/ d log f(x, 8) V > 

V / ” 


2 (^)r 1 


where the differentiation by which - is obtained is performed with fa , ■ • , fa 

held constant. Consider the last (l — 1) rows of K as fixed and (7cn, fci 2 , • ■ , ku) 
as free to vary subject only to the restriction that \K \ = 1. The left member 
of (6.5) is then a fixed quantity, while the right member is a function of the first 
row of K. The inequality (6.5) must remain valid for all admissible 
(fcn, ■ , feu). Hence (6.5) will remain valid if the right member of (6 5) is 
replaced by its maximum with respect to (7cn , • • ■ , k u ). We shall obtain this 
rv'QviTY-.nm and find that (6.5) then implies a result about the minimal ellipsoid 

of concentration. 2 # 

The problem is therefore to minimize a {fa). Now 

(6 6) <7 2 (i^*) = Yj fa fafa . 

V / t.J * 


The family of ellipsoids in the space of (fcu , ■ • , ku) 

( 6 . 7 ) Yi faknki, = c, 

where c is a running parameter, has all centers located at the origin. Let 

(fcii , • • , kii) 

be the sought-for maximizing values of {ku , • • ■ ,ku). From the definitions of 
K and K~ l we have 

(6.8) Y hli = 1 

where (fc 11 , k 2 \ • ■ • , kf 1 ) are constants. It follows that the minimum value 
Co of /(fa) is such that the ellipsoid 

(6.9) Y fa k u kij = Co 

J 

is tang ent to the hyperplane <6 8) at the point (fc?i , • • ■ , ku) . Now the tan¬ 
gent plane to (6.9) at this point is given by 

(6.10) Y Kjkuku = co. 

ill 
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From (6.8) and (6.10) we obtain 

Cok 11 = E (i = 1, • • ■, l). 

coErfc“ = ^, O'= i, 

i 

CoSx'^ 11 ^ 1 = 1- 


d log f _ y ,.i 3 log/ 

/ 3 log/ V = ^ k ,i d log / d log/ 

\ / J.) d9> d6j 

From (6 5), (6 13), (6.14), and the definition of c„ we conclude that 

(6.15) E^V > Z 

*.7 

where 

( 8 . 10 ) 

We may restate (6.15) as follows. The concentration ellipsoid 

(6.17) E X”(fc, - ».)(*, - 9,) = l + 2 

».7 

of the unbiased estimates 0* , • ■ , 6* always contains within itself the ellipsoid 

(6.18) E m«(*. ~ *.)(*, ~ *») = l + 2 

*.3 

where the are defined by (6.1G). 

The question of the coincidence of the two ellipsoids is connected with the 
question of the existence of sufficient estimates. It may be difficult to state 
any general results about the concentration ellipsoid of biased estimates without 
postulating some relationships among the biases and/or their derivatives. 

7. On Wald’s equation and related results in sequential analysis. In sec¬ 
tion 4 we referred to a proof by Blackwell [1] of an equation due to Wald [5] 
which is fundamental in the Wald theory of sequential tests of statistical hypothe¬ 
ses Here we shall give a perhaps simpler proof of this equation, and then prove 
several new and related results of general interest for sequential analysis. 

The results of Theorems 7.2 and 7 3 below can be obtained by differentiation 
of Wald’s fundamental identity of sequential analysis ([6], [7]). However, the 


( 611 ) 

Hence 

( 6 . 12 ) 

from which 
(6.13) 

We have 

(6 14) 
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conditions under which we obtain these results are less stringent than any so far 
found sufficient to establish the identity and the validity of differentiating it. 
Theorem 7.4 and its corollaries refer to sequential processes where the chance 
variables may have different distributions or even be dependent. In the future 
we hope to return to the question of finding all central moments of Z n , the 
problem of generalizing the fundamental identity, and related questions. 

For Theorems 7 1, 7.2, and 7.3 wc shall assume a chance variable X whose 
cumulative distribution function F(x) is subject only to whatever restrictions 
may be explicitly imposed on it in each theorem We assume the existence of a 
general sequential process such as is described above, which is subject only to 
such restrictions as may be explicitly formulated in each theorem. The sequen¬ 
tial process of course defines the chance variable n. Let *i, • be sue- 

cessive independent observations on X. We define Z n = . If E(X) and 

1=1 

a{X) exist we shall denote them by w and a 1 , respectively. 

Theorem 7.1 (Wald [5], Blackwell [1]). Suppose w and En exist. Then 

(7 1) E(Z n - nw) = 0. 

The following theorem, which is a sort of partial converse of Theoiem 7 1, is 
proved concomitantly with Theorem 7.1: 

Theorem7.1.1. If EZ n exists, and if either P[X > 0} = 0orP(X < 0] = 0, 
then w and En both exist, and 

EZ n = wEn. 

Actually the same proof suffices for a somewhat stronger form of Theo¬ 
rem 7.1 1: 

Theorem 7.1.2, If EZ n exists, and if 

E(X t | n = j) > 0 (or < 0) 

for all positive integral j such that P {n = j) ^ 0, and all i < j, then w and En 
both exist, and 

EZ„ = w En 

Theorem 7.2. If E | x ( — w | ^ exists, then a 2 and En both exist, and 

(7.2) E(Z n - nw) 2 = oEn . 

We have 

E{Z n — nw) = E (2 '(z. — w)\ = S f (S fa. — IT)') dF(x,) 
\l-l / 1-1 Js, \i_l / i-l 

-it] idfw. 

J =1 R i fTinl 


(7.3) 
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Also 

11 o On 771= 1 

(7 4) . 22 (X, - to) n dF(x m ) = P{n > j\E(x, — to) = 0. 

ioaj J Ry 771=1 

Hence ' 

(7.5) 


EE ( (^ - to) II dF{x„) = 0. 

7=1 1=7 * R , m=l 


From this (7.1) follows. 

Suppose now that the conditions of Theorem 7 2 are fulfilled. We have 

°o /■ / 7 \ 2 771=3 

E(Z n - nwf = 22 (£(*.- w)) II dP(x m ) 

3=1 **Rj \l==l / 771=1 

00 03 771 = 4 

(7 6) =EE (*, - to) 2 II df(*») 

7=1 1=3 7 /i=L 

+ 2 2 22 2 f (*. - to)(a, — to) H dF{x m ). 

J=2 S=1 1= / “ R I 771=1 

Let s < j be any two positive integers Then 

<® n 7/1=1 

(7.7) 2 / (a;. — to) (x, - to) II dP(a; m ) = 0. 

1=7 •'R, 771 = 1 


Hence 


00 3—1 00 n rn= t 

(7.8) 22 12 £ (*. — to) (a;, - to) II dF(x m ) = 0 

J =2 8=1 1=7 7710=1 

In a similar manner we obtain 


(7.9) 22 f (x, - to ) 2 n dF(.r„,) = <r 2 P[n > 3 ). 

t=“7 “S, 771=1 

From (7.6), (7.8), and (7.9) it therefore follows that 

(7.10) E{Z n - nwf = <r 2 22 P{n > 3 } = « 22 jP{n = j) = <r 2 Pn 

3«™1 3-4' 

which is the desired result. 

It remains to prove the validity of rearranging the series in (7.3) and (7.6) 
First, we have 

* r »■=» 

(7.11) 22 I % - to | II dF(x m ) = P{n >j\E\X - w I. 

1 1-1 ■’B, 71 -.=1 . , 
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Hence it follows that 

eg QO I* nboi ^ 

2D I x i ~ w | II ^(s m ) = 2 P( n > j\E I X - 

(—1 J R { m —1 “ 1 " 1/1 1 


V) 


( 7 . 12 ) 


= S | X - w | £ jP[n = j\ = E\X - w\En. 
!~1 

This justifies the rearrangement of termB in the series in (7.3). Second, the 
series (7.0) is dominated by the series 


(7.13) 


OO gQ /♦ Man 1 

ss ( x i- w ) 2 n dF(* m ) 

Jaa\ 1«a»] "fij WkhI 


+ 2 S 2 f | ai. — w\-\xj — re | I! dF(x m ) 

; b =>2 om,L lBa j Jlij m™l 

all of whose terms are positive. The series (7.13) converges because 

(7.14) E | a', - w < + oo. 

Hence the rearrangement of the series (7.0) is valid. 

In the sequel we require certain sets It \{j = 1,2, • ■ ■ ) which we shall define 
now. Let if */, i < j, be the totalily of all points (xi, • • • , x,) Buch that 

(7.15) (J*i, • •' , x t ) (It,. 

Let R 1 be the ./-dimensional Euclidean space. Then 

(7.16) Il'i - R‘ ~ E R*» ■ 

1«*1 

We shall now prove: 

Theorem 7.3. <S Suppose that Ii i x. — w | J and En | x, — w | J 

E(Z n - nuO* - w»Eh I- X<r'En(Z n - nw ) 

« 

K'a J ** E( X t»f 


eml. Then 

(7.17) 

where 


’The author has aueatroiled in proving iluil ilu» exmlewc of E 


the existence of 
nectionwith other results. 


' r. I® 

J2 1 ■*< - w I ' 

J 

E I .r, - u: j j, T1 m> pntof will Im published subsequently 


implios 


in con- 



226 


J. WOLFOWITZ 


Pboof: We have 


E(Z n - nw) z = It [ |~E Cri ~ m>)1 II dF{x n ) 

1=1 J Rj L *-i J m=1 

= E f E (*. - w) 3 II dF(x m ) 

Jrxl ” R J t=l m—1 

( 7 . 18 ) + 3 E [ E E (a:. - w)(x { - w ) 2 II d^(x m ) 

J R J a — 1 77kal 

+ 3 E [ E E (*. - M , ) S ( a; . - n dF(a: m ) 

3=2 t=2 a=l w=l 

+ 6 E [ E E E (*I - «0 (3'» - «0 (x t - w) n dF(x m ). 

J=3 Jr, 1=3 J=2 1=1 m=l 

Considering the first term in the right member of (7.18), it follows that 

E f [ E (*. - W^lUd^O-n) 

3=1 J Rj i«=l J wiesI 

„ q\ = EE [ (*. -w>) 3 II dF(z m ) 

1=1 Jural "Rj 77T=»1 

00 

= TjW S P[n > i j 
1-1 

V 

oo 

= ^jiw 3 P{n = i\ = w a En. 

i=i 

All the rearrangements of terms in the operations involved in the proof of Theo¬ 
rem 7 3 are legitimate because the various series are absolutely convergent. 

As for the second term in the right member of (7.18), we have 

E [ E E {*• - w)(x, - wf II dF(x m ) 

3=2 JRj iW 2 fl«=l m=l 

n 9 ( v, = E E E [ (*. - w)(.T, - «i) 2 n dT(x„.) 

V* foal 1=8 + 1 3 = 1 TO=1 

= v 2 e e [ (*. -») n dF(x ro ) 

a=»l trafl+1 « fl',— 1 Jrt=l 

= r 2 E E f (a;, — w) IJ dF{x m ). 

8=1 tDi •'R'{ m—1 

We now operate on En(Z„ — mu), and obtain 

#n(Z„ - nw) - E f 3 E Ori - w) II dF{x m ) 

3=1 ‘'By 1=1 m=l 


(7.21) 


= EE f i(Xj - w)II dF(x m ). 

j=l 1=3 lrt=l 
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We observe that 

°0 n t 

2 / i(x, - «>) n ^(xj 

i-=>7 ^ R t m«l 

r 

( 7 . 22 ) = j 2 (x, - w) n dF(x m ) 

V i-j JR % m-L 

+ 22 f (x, — w) II dF(x m ). 

flBaj-f-l TN=1 

To evaluate the left member of (7.22), we proceed as follows: It is easy to see that 

(7.23) 2 f (x, - w) II dF(x m ) = 0. 

V 1—1 -i 

Moreover, when s > j, 

(7.24) 2 f ( x i “ w) II dF(x m ) = f (x, — w) jj dF (x m ) 

' ,-i »>-*l ■'S'.-I m— 1 

Hence 

(7.25) 2 f » ( x i ~ w) II df (x m ) = 2 f fa - w)jj dF(x m ) 

V , Js, m—1 »—J ■'S'l m “ l 

Therefore 

op oo p a 

(7.26) En(Z n - nw) = 22, (** - ») H dF(x m ). 

N 3«=1 J « a m—A 

It remains now to consider the third term of the right member of (7.18). 
Wehave 

2 f t 2(x. — w) 2 (x, — w) II dF(x m ). 

1=2 JHy i =2 «—1 m — 1 

= 222 f (x. - ™) 2 (*. - ») n *?(*»)• 

J,1 Mtlimh, m “ 1 

Now, suppose that in the expression 

j 

(7.28) y,„ = J (x, — w) 2 (x< - w) II dF(x m ) 

where j > f > s, we integrate with respect to all x m for which m ~> i. Then 
it is not difficult to see that 

(7.29) 2^.., = 0 

for all s and ^ such that 1 < s < i. Hence from (7.27) 

(7.30) 2 f 2 2 (*. - w? (*. - «0 fr dF(x m ) = 0. 

1=2 JSj 1-2 <_1 m - 1 


(7.27) 
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In a similar way it is shown that the fourth term of the right member of (7 18) 
is zero. 

The desired result (717) is a direct consequence of (7.18), (7.19), (7.20), 

(7.26), and (7 30). .. 

Consider now an infinite sequence of chance variables Xi, , ■ ■ , which 

need not have the same distribution and which may be dependent (in which 
case they must satisfy the obvious consistency relationships). We take sue- 
oessrve observations on these chance variables and define a sequential process 
as above, which is subject only to such restrictions as we shall explicitly state. 
Let Z„ maintain its previous definition. 

Theoeem 7.4. Suppose that 

(7.31) = E(X t \n>i) 

exists for all positive integral i for which P {n > i] ^ 0. In those cases write 

(7.32) * = E(\X, - v. | / n >f). 

Suppose also that the series 

(7-33) 2 (v'i + ‘ • + v[)P[n = i] 

• i-i 

converges. Then 

Z n — = 0. 

It is regrettable but unavoidable that the mean values j>» and v[ entering into 

(7.33) and (7.34) be conditional. The fundamental reason is that the sequential 
process may drastically modify the distribution of dependent chance variables, 
so that their distribution for our purposes can only be considered in conjunction 
with the sequential process itself. Consider the following example 

P{Xt= -1} = a, P\X\ = 1} = * 

P{X 2 -2 | Xi = -1} = J 

P{X, = -1 |X 2 = -1} = a 

P{X 2 = 1 I X x = 1} = I 

P{X 2 = 2 | Xi = 1) = a 

We have I?(X 2 ) = 0. Suppose we define the following sequential process: 
If Xi = — 1, n = 1, and if Xi = 1, n = 2. It is then clear that for our purposes 
X 2 can take no negative values and the fact that E(Xf) = 0 is of no use to us. 
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If, however, the chance variables Xi , X 2 , ■ are independent, this difficulty 
disappears, and we have the following 

Corollary 1 to Theorem 7 4 If the chance variables Xi, X 2 , • • arc inde¬ 
pendent, we have Theorem 7 4 with v, = E(Xf), and v[ = E \ X, — v t | . 

If further all the X , have the same distribution, we see that Theorem 7.1 is 
a special case of Theorem 7.4, since the convergence of the series (7.33) is then 
a consequence of the existence of w and En. From this aigument we see, how¬ 
ever, that it is not necessary that all the X, have the same distribution, and we 
may write the following generalization of Theorem 7.1: 

Corollary 2 to Theorem 7 4 Let the X, be independent with, in general, 
different distributions Suppose, however , that all r, are equal, and all v\ are equal, 
except perhaps for those 1 such that P [n > i\ =0. Suppose further that En exists. 
Then (7.1) holds. 

Among possible fields of application of Theorem 7.4 are sequential tests of 
composite statistical hypotheses, and the random walk of a particle governed 
by probability distributions which are functions of time and the position of the 
particle. The extension of this theorem to vector chance variables is straight¬ 
forward. The extension to higher moments may present difficulties We hope 
to return to some of these questions in the future. 

' Proof of Theorem 7 4. This is very elementary. We have 

e(z k - ir,) = E f [£(*,- *)] dF(x 1, . - ,x,) 

\ / 3 =i Jr j L 1=1 J 

OO OQ p 

(7.35) = EE ( x , — v ,) dF(xi , ■ ■ ■ , xj. 

7=1 t™; 

= EP{n > l)E(X, - vi | n > 3 ) = 0. 

j=i 


The rearrangement of the series is valid because 


EE 1 X, — V, I dF(x 1 , • ■ , x.) = Ev',P{n > 3 } 

,_i ._j Jr, 1=1 

(7 36) 

= E (vi + • • ■ + v[)P{n = j) 


which converges by (7.33). 
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ESTIMATION OF LINEAR FUNCTIONS OF CELL PROPORTIONS 

By John H. Smith 
Bureau of Labor Statistics 

Summary. In this article certain contributions aie made to the theory of 
estimating linear functions of cell proportions in connection with the methods 
of (1) least squares, (2) minimum chi-square, and (3) maximum likelihood. 
Distinctions among these three methods made by previous writers arise out of 
(l) confusion concerning theoretical vs. practical weights, (2) neglect of effects 
of correlation between sampling errors, and (3) disagreement concerning methods 
of minimization. Throughout the paper the equivalence of these three methods 
from a practical point of view has been emphasized in order to facilitate the 
integration and adaptation of existing statistical techniques. To tins end: 

1. The method of least squares as derived by Gauss m 1821-23 [6, pp. 224- 
228] in which weights m theory are chosen so as to minimize sampling variances 
is herein called the ideal method of least squares and the theoretical estimates 
are called ideal linear estimates This approach avoids confusion between 
practical approximations and theoretical exact weights 

2 The ideal method of least squaies is applied to uncorrelated linear func¬ 
tions of correlated sample frequencies to determine the appropriate quantity 
to mini m ize m order to derive ideal linear estimates in sample-frequency prob¬ 
lems This approach leads to a sum of squares of standardized uncorrelated 
linear functions of sampling errors in which statistics are to be substituted in 
numerators 

3. A new elementary method is used to reduce the sum of squares m (2)— 
before substitution of statistics—to Pearson’s expression for chi-square In 
this result, obtained without approximation, appropriate substitution of sta¬ 
tistics shows that the denominators of chi-square should be treated as constant 
parameters in the differentiation process in order to minimize chi-square in 
conformity with the ideal method of least squares 

4. The ideal method of minimum chi-square, derived m (3) as the sample- 
frequency form of the ideal method of least squares, yields ideal linear estimates 
in terms of the unknown parameters in the denominators of chi-square When 
these parameters are estimated by successive approximations in such a way as 
to be consistent with statistics based on them, it is shown that the method of 
minimum chi-square leads to maximum likelihood statistics 

5. An iterative method which converges to maximum likelihood estimates is 
developed for the case in which observations are cross-classified and first order 
totals are known In comparison with Deming’s asymptotically efficient 
statistics, it is shown that, m a certain sense, maximum likelihood statistics 
are sunerior for any given value of n —especially in small samples. 

6 The method of proportional distribution of marginal adjustments is de- 
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veloped. This method yields estimates of expected cell frequencies whose 
efficiency is 100 per cent when universe cell frequencies are proportional—a 
condition closely approximated in most practical surveys for which first order 
totals are available from complete censuses. Whether this favorable condition 
is satisfied or not, the method yields results which are easy to interpret and it 
has many computational advantages from the point of view of economy of time 
and effort. 

Throughout the article discussion is confined to the estimation of parameters 
whose relationships to cell proportions me linear. However, most of the results 
can be extended to the case of non-linear relationships, the necessary qualifica¬ 
tions being similar to those in curve-fitting problems when the function to be 
fitted is not linear in its parameters. In this case, of course, least squares esti¬ 
mates are not linear estimates. In particular, obvious extensions of the general 
proofs in sections 5 and 6 make them applicable to the non-linear case. Thus 
even when relationships are non-linear, it can be shown that the method of 
minimum chi-square is the sample-frequency form of the method of least squares 
which leads (by means of appropriate successive approximations) to maximum 
likelihood statistics in sample-frequency problems. This principle which 
establishes the equivalence of the methods of least squares, minimum chi-square, 
and maximum likelihood greatly facilitates the integration and adaptation of 
existing techniques developed in connection with these important methods of 
estimation. 

1. Introduction. This article deals with problems of statistical estimation in 
which the parameters to be estimated are cell proportions or linear functions of 
them 'A simple illustration of this type of problem is that of estimating p, 
the proportion of white men in a population classified by race and sex. Fom 
a sample of n persons selected at random from such a population, the desired 
proportion can be estimated by simply taking the sample proportion of white 
men as an estimate of the corresponding cell proportion in the population or 
universe. This estimate is unbiased for all possible values of p and its sampling 
variance is p(l — p)/n —assuming, for simplicity, that sampling is done with 
replacements. Whether a more accurate unbiased estimate of p can be derived 
depends on whether or not any other relevant information concerning the cell 
proportions in the universe is available For example, it may be known that 
all of the white portion of the population is composed of married couples so that 
in the universe the number of white men is exactly equal to the number of white 
women. This knowledge implies that half the proportion of whites provides an 
unbiased estimate of p which is far more accurate than the sample proportion 
of white men In fact, the sampling variance of half the proportion of whites 
is equal to (2p)(l — 2p)/4«.—less than half the sampling variance of the pro¬ 
portion of white men. 

The term ideal linear estimate will be used to refer to any statistic winch satis¬ 
fies the criteria of estimation implied by the foregoing discussion—that is, an 
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ideal linear esnnate is any estimate which ( 1 ) is a linear function of the sample 
observations, (2) is recognizable as unbiased by the research worker; and (3) 
has minimum sampling variance among estimates which have properties ( 1 ) 
and (2). These important criteria of estimation will now be stated in more 
technical language 

Let ni, ri 2 , and n a represent the number of ( 1 ) white men, (2) white women 
and (3) non-white persons, respectively, in samples of n persons. Since any 
linear function with a constant term can be reduced to the homogeneous form 
by adding an appropriate multiple of the identity 

(1.1) »i + n a + n 3 — n = 0, 

it is possible, without loss of generality, to confine attention to linear estimates 
of the form 

(1.2) T = d\7lx -j- dzTli + <23113 , 

which are recognizable as unbiased, In this example, the research worker is 
assumed to know that the cell proportions in the universe are 

(1 3) Pi, P*. P* = P, P, 1 ~ 2p 

Hence, absence of bias implies that the expected value of T 

(1.4) E(T) = ainpi + a 2 np 2 + a 3 np 3 

= (at -f- a 2 — 2 a s )np + na 3 

is identically equal to p, in other words, that 

(1 5) n(ai + a 2 — 2 a 3 ) — 1 = 0, 

and 

na 3 = 0 

The ideal linear esimate is derived by finding values of cq , o 2 , and a 3 which 
minimize the sampling variance of T subject to equations (1 5) as side condi¬ 
tions . 1 In this way it can be shown that half the sample proportion of whites 
is actually the ideal linear estimate of p. For more general problems, the 
process of minimization of sampling variances with the aid of Lagrange multi¬ 
pliers involves expressions which are complicated algebraically. For this reason 
it is usually easier to derive ideal linear estimates of parameters which are linear 
functions of cell proportions by the ideal method of least squares which is 
presented in section 4 

Like other least squares estimates, an ideal linear estimate of a linear function 
of cell proportions depends on ideal least squares weights. Since these weights 

1 In this example, it is possible to solve equations (1.5) for a 2 in terms of <u , drop sub¬ 
scripts, and substitute in the formula for the sampling variance of T to obtain a quadratic 
in a to be minimized. 
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are, in general, functions of variances and covariances of sample frequencies, 
the theoretical connotation of the term “ideal” makes it preferable to other 
terms such as “optimum” and “best ” In this connection it should be em¬ 
phasized that (1) the sampling variance of linear estimates is insensitive to 
small errors in estimating ideal weights, and (2) the process of deriving practical 
approximations to ideal linear estimates automatically provides maximum 
likelihood estimates of the ideal weights Thus the estimation of weights is 
perfectly objective and the best practical approximations to ideal linear esti¬ 
mates are expressed in terms of sample observations This degree of objec¬ 
tivity is rare in statistical estimation as a brief consideration of regression prob¬ 
lems will illustrate 

In ordinary regression problems, the ideal weights are inversely proportional 
to error variances It is usually necessary to draw upon past experience to 
estimate relative weights because satisfactory estimates of error variances 
are rarely available in terms of sample observations. From the present point 
of view, the widespread use of equal weights implies the subjective “assumption” 
that all error variances are equal. (Maximum likelihood estimates of regression 
coefficients require, m addition, the even more subjective assumption of nor¬ 
mality ) In spite of these (usually implicit) subjective assumptions, dis¬ 
cussions of optimum properties of least squares regression coefficients based on 
ideal weights in terms of unknown -parameters are highly commendable because 
(1) sampling variance is not very sensitive to small errors m weights and (2) 
properties of theoretical ideal linear estimates furnish a simple basis for dis¬ 
cussion of the properties of practical statistics based on any reasonably good 
approximations to the exact ideal weights In any case, it is important to 
know what the ideal weights are in terms of unknown parameters because 
research workers can make better estimates if they know what quantities should 
be estimated than they could otherwise. 

2. Estimation of a single parameter. In sample-frequency problems, least 
squares weights are rarely given explicitly or even implied by information 
available to the research worker. Since the hypothetical example used m 
Section 1 is a trivial special case from this point of view, a more realistic ex¬ 
ample is presented in this section. Since the biological interpretation of this 
problem is presented in detail m all but the first of the many editions of Fisher’s 
well-known book [3] it is sufficient here to consider only the statistical problem. 
The four cell proportions arc 

(2.1) Pi , Vi, P3, Vi = (2 + d)/ 4, (1 - 0)/4, (1 - 0)/4, 0/4, 

and the parameter d is to be estimated from the set of sample frequencies 

(2 2) n L , 7 H, n a , n 4 = 19D7, 906, 904, 32, 

obtained in a sample of n = 3839 selected at random from an infinite universe 
.Fisher considers five different statistics— Ti, T 2 , T a , T 4 , and T B —iso it will 
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be convenient to use the symbol 2' 6 for the ideal linear estimate Consider 
the class of linear unbiased estimates of the form 


(2 3) T — aiiii -f- a,%7i2 -(- 1 x 3112 -f- 0 ,^ 11 ^, 

where absence of bias implies that 


(2 4) 2d! + o,i n 3 = 0 

and 

si 

»i — a 2 — a? + a 4 — 4/n = 0. 

Minimizing the sampling variance of T in equation (2 3) subject to side 
conditions based on equations (2 4) yields the ideal linear estimate TV defined 
by the equation 

(2 5) n(l + 26) To = 3 6n L - 30n 2 - 3 dn 3 + (4 - d)n 4 

The exact sampling variance of TV , 


( 2 . 6 ) 


a _ 20(1 - 0)(2 + 6 ) 

ff6 »(1 + 28) ’ 


is used by Fisher as the asymptotic sampling vauance of any efficient estimate 
of 6 The exact sampling vauance of the ideal linear estimate is especially 
appropriate as the asymptotic sampling variance of the maximum likelihood 
estimate 2V. because 2V is the limit of an iterative process designed to estimate 
T f , as closely as possible from sample data by using successive approximations 
to !T 0 for 6 m equation (2.5) The limit of this process (winch is, of course, 
only an approximation to TV) can be obtained by substituting the symbol TV 
for both TV, and 0 m equation (2 5) and solving the resulting quadratic equation 
which can be reduced to 


(2.7) nT\ — (ill — 2 iii — 2 n 3 — a*) TV — 2n,\ = 0, 

an equation which is identical, except i ’01 notation, with Fisher’s equation of 
maximum likelihood of which 2V is the positive solution. 

The foregoing lesult is a comparatively simple illustration of the general 
principle that the maximum likelihood estimate of any linear function of cell 
proportions is the limit of an iterative process designed to approximate the 
corresponding linear estimate as closely as possible by means of sample fre¬ 
quencies Since the accuracy of estimates of least squares relative weights 
increases with size of sample, maximum likelihood statistics have, in an asymp¬ 
totic sense for large samples, the same optimum properties which are possessed 
in an exact sense (even for small samples) by the corresponding ideal linear 
estimates. Thus the results obtained by means of the theory of large samples 
are supported by the approach to estimation problems by means of ideal linear 
estimates. In addition, the later approach facilitates the integration of 
available techniques as explained m later sections. 
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It is true that the optimum properties of maximum likelihood statistics can 
be presented m terms of the theory of large samples, but the fact that a given 
method of estimation yields a statistic whose asymptotic sampling variance is 
a minimum does not imply that the same technique will yield a minimum 
variance statistic for any given small value of n For example, it is well known 
that the median is a maximum likelihood estimate of the midpoint of a double 
exponential universe. Nevertheless, in samples of three observations from 
such a universe, another statistic—4/9 of the mean plus 5/9 of the median— 
has greater relative advantage over the median than the median has over the 
mean 

Fisher’s discussion of the relative, efficiencies of his five alternative consistent 
statistics suggests that it is impossible to formulate objective criteria for making 
choices among alternative statistics such that each statistic will be used wdienever 
its samphng variance is smallest Consider the sequence of universes generated 
by letting 0 vary from zero to unity. In general, each value of d would deter¬ 
mine which of Fisher’s five statistics would have smallest sampling variance 
for that particular universe for any given value of n In comparison with 
any other single statistic, the statistic T 4 would usually have smaller samphng 
variance, but there are notable exceptions. For example, in the absence of 
linkage when 6 is equal to one-fourth, the statistic T 2 is the ideal linear estimate 
and its sampling variance is smaller than that of —at least for certain small 
values of n For this reason, Fisher used T 2 m preference to 7.\ as the basis for 
testing the significance of linkage The statistic T s —deiived by Fisher’s method 
of minimum chi-square—is also of special interest Fisher’s method of minimum 
chi-square yields statistics which differ from the corresponding maximum 
likelihood statistics because Fisher considers the denominators as variables in 
the process of differentiation instead of considering them as unknown para¬ 
meters to be estimated by identifying them with the corresponding statistics 
m the numerators after differentiation. Arguments of later sections tend to 
show that the latter method is more appropriate. In this example, it can be 
shown that if T& were substituted for the corresponding parameter in the de¬ 
nominators of chi-square {and, treated as a parameter) the minimization of chi- 
square with respect to statistics m its numerators -would be exactly equivalent 
to substituting 0.035785, the numerical value of T t for 6 in equation (2.5) and 
solving for Tt to obtain 0.035717, a value which is much closer to 0.035712, 
the numencahvalue of the maximum likelihood estimate 7 \ than to Fisher’s Tt . 
In problems of estimation chi-square should be minimized m order to obtain 
efficient statistics—not to obtain a small criterion for testing goodness of fit— 
and it should be minimized in a manner consistent with this purpose. Whether 
or not it is possible to derive an even smaller value for a quantity called chi- 
square should be considered to be irrelevant in either estimation problems or 
tests of significance. It is difficult to present these ideas in more technical 
language because it is possible to construct trivial hypothetical universes for 
which Fisher’s method of minimum chi-square provides statistics which are 
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superior in certain respects to the corresponding maximum likelihood statistics. 
Nevertheless, it seems clear that the ideal linear estimate usually has smaller 
sampling variance than the maximum likelihood statistic which, in turn, usually 
has smaller sampling variance than any other given practical statistic. Evi¬ 
dence presented in later sections tends to show that these advantages are more 
important in small samples than in cases in which the theory of laige samples 
is applicable. 

3. The “ideal” method of least squares. When sample observations are 
uncorrelated in successive samples and parameters to be estimated are linear 
functions of the expected values of the sample observations, the method of least 
squares yields ideal linear estimates of the parametes provided that the weight of 
each observation is inversely proportional to its variance in successive samples. 
Although the minimum sampling variance property among linear unbiased 
estimates is seldom stressed, this principle of weighting has been presented in 
connection with the method of least squares for more than a hundred years. 
In order to emphasize the theoretical nature of weights which depend on vari¬ 
ances wluch are usually unknown m practice and to distinguish the method 
based on such weights from the more familiar method of least squares with 
equal weights, the method which yields ideal linear estimates will be called the 
ideal method of least squares 

Discussion of the general problem of estimating linear functions of cell pro¬ 
portions can be facilitated by making use of results obtained by other writers— 
notably Gauss (as reported by Whittaker and Robinson [6]) and Pearson [4], 
According to Whittaker and Robinson, “the first writer to connect the method 
[of ideal least squares] with the theory of probability was Gauss” [6, p. 224], 
In his Theona Motus proof of 1809, Gauss derived the “most probable value” 
[6, p. 223] of a parameter (i e , the statistic which satisfies the criterion now 
called maximum likelihood) for the case m which sample observations are sta¬ 
tistically independent and normally distributed. In his Theoria Combinationis 
proof of 1821-23, Gauss “abandoned the ‘metaphysical’ basis” [6, p. 220] of 
his earlier work and derived the method herein called the ideal method of least 
squares (without approximation) from the criteria of (1) minimum variance and 
(2) absence of bias for the case in which “the mean value of [the covariance of 
a pair of errors] is zero” [6, p. 224] Since the covariances of uncorrelated linear 
functions are zero whether they arc statistically independent or not, it follows 
from the woilc of Gauss that the ideal method of least squares applied to un¬ 
correlated linear functions of sample frequencies yields ideal linear estimates. 
In other words, the ideal method of least squares implies the following six steps: 

1 From the set of k + 1 sample frequencies construct h linear functions 
which are uncorrelated in successive samples 

2 From each function subtract its expected value in terms of the unknown 
paiameters to find its sampling error. 
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3 Write the ratio of each sampling error to its own standard error m the 
form of a fraction. 

4. Sum the squares of these standardized uncorrelated sampling errors to 
obtain a quantity called cln-square 

5 Substitute statistics 8 for the parameters in the numerators of chi-square. 

6. Minimize the sum of squares of residuals with respect to each statistic 
in turn (subject to appropriate side conditions m ease linear functions 
not implied m preceding steps arc known). 

This senes of six steps can be summarized by the single statement that the 
function' to minimize is the sunr of squares of standardized uncorrelated resid-' 
uals. Actually this statement is oveisimplified because even though sampling 
errors are both uncorrelated and standardized, the corresponding residuals 
are, in general, neither standardized nor uncorrelated. 

4. Pearson’s expression for chi-square. As defined by Pearson [4], chi- 
square is the sum of squares of a set of fc standardized uncorrelated linear func¬ 
tions of sampling errors in a set of 7c -f- 1 correlated sample frequencies. A set 
of k standaidized uncorrelated linear functions can be constructed in an infinite 
number of ways, but each set can be obtained from any of the otheis by means 
of an orthogonal transformation. Thus the sum of squares is the same no 
matter what set is originally chosen. As his set of standardized uncorrelated 
linear functions, Pearson chose those determined by the axes of the correlation 
ellipse for which he gave the required sum of squares in terms of “minors” or 
cofactors of the correlation determinant of the first k sample frequencies Pear¬ 
son reduced this complicated expression to the now familiar form 

k +1 

(4.1) X 2 = £ ( n , - «pi)7npx, 

i^l 

where pt is the proportion in the ith cell in the universe and n, is the frequency 
in the ith cell of a sample of n observations selected at random from an infinite 
universe (or with replacements from a finite universe). 

The widespread misunderstanding of the nature of cln-square seoms to be 
based primarily on the facts that 

1 Pearson’s rule for degrees of freedom is inadequate (see section 5), and 

2 Pearson’s expression for chi-squarc can be derived by approximate methods 
as well as by exact methods. 

Pearson’s derivation of the expression for chi-square by exact methods is suf¬ 
ficient to show that its derivation by approximate methods involves a paradox 
in which different sets of approximations offset each other, however, Pearson’s 
article is relatively inaccessible and, in addition, his algabraic reductions involve 

1 It is convenient to call these variable symbols “statistics”, the quantities whose 
squares are summed, “residuals”, and the whole expression “chi-square,” even though, 
from a certain point of view, these terms are strictly applicable only after the minimiza¬ 
tion process. This usage should always be clear from its context. 



ESTIMATION OP LINEAR FUNCTIONS 


239 


the minors of a general determinant of the fcth order. For these reasons, the 
following exact derivation is presented in terms of elementary algebra. 

Since the sum of squares is the same for any set of h standardized uncorrelated 
linear functions of the sampling errors in k + 1 correlated frequencies, a set should 
be chosen for which the algebraic reductions are as easy as possible. From this 
point of view a satisfactory set, which can be written in any of three forms, is 
given by 

(4.2) Vi = piRi-h Vi- 


= —PA- - (p. + p,+)e l 


where e, = — Rp» and i-\- and i— refer to classes formed by combining all 

classes above the ith class and below the fth class, respectively. 

By means of the known variances and covariances of the sample frequencies 
in expected value form, 

(4.3) E(e 2 ,) = np t (l - p,), 
and 

(4.4) E(e % e,) = -mp.p;, 
it can be shown that the variance of y, is 

(4 5) E(y x ) = np.p.+fp, + p,+), 

and, by using the third expression in equation (4 2) for y x and the second for 
y, , it can be shown that any pair of p’s are uncorrelated because 

(4 6) E(y x y,) =0, (i < j). 

Let 2 , represent the variable y % expressed in standard-deviation units. The 
square of this standardized uncorrelated linear function of correlated sampling 
errors can be written 

(A n\ 2 (Pi £■+ ~ Pi+ e ») 

' 1 ?ip,p, + (pt + p,+) ' 

It remains to show that Pearson’s expression for chi-square can be obtained 
by adding the fc values of z\ in succession For this purpose it is convenient 
to define 


(4.8) 


2 

Xr 


r J J 

_|_ Cr+ 

t=-l np, ?tpr-f- 


obtained by combining all classes above the rth class. 

When r = k, the expression in equation (4.8) is the expression to be derived. 
It remains to show that xt is the sum of squares of k standardized uncorrelated 
linear functions of sampling errors; le, 
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(4.9) Xk = S z\. 

For the first cell e 1+ = — et and p i+ = 1 — pi ■ Hence y\ reduces to the negative 
of the error in the first frequency and 

(4 10 ) xl = el/np,(l - pi) 

= e\/npi + e\ + /npi+ (pi+ = 1 - pd, 

a special case expressed in the required form. The general case is established 
by showing that 

(4.11) xl -1 + zl = xl , 


or, alternatively, that 


2 2 2 
Zt = Xr ~ Xr-l 

= el/np, 4- el + /np r+ — ( e r + e r+ ) 2 /n(p r + p r +) 

_ (p, + 4 + Prel+)(p r + pr+) - PrPr+(el + 2e r c H - + el +) 
np r pr+(p, 4- p,+) 

_ ~ 2 p r p r+ e T e r+ 4~ ^ - Pr+e r ) 5 

np,Pr+(p r + Pr-r) 


thus establishing the derivation of Pearson’s expiession for chi-square. 

When sampling is done without replacement each variance and covariance 
is multiplied by (N - n)/(N — 1) where N is the number of observations in 
the universe Hence, chi-square for this case can bo written 


(4.13) 


2 

X 


N- lg A 
N — n np t ' 


This expression shows that the factor involving sampling errors is the same 
whether sampling is done with replacement or without replacement. Hence, 
the derivation of least squares statistics is the same for either method of sampling, 
but sampling variances for the simpler case are multiplied by the factor (N — n)j 
(N — 1) when sampling is done without replacement. 


5. The method of minimum chi-square. The derivation of Pearson's ex¬ 
pression for chi-square completes first four steps of the ideal method of least 
squares outlined m section 3. Hence, the method of minimum chi-square is 
the sample-frequency form of the ideal mothod of least squares in which only 
two of the six steps remain to be taken. 

In his original article [4] Pearson pointed out that the use of statistics instead 
of parameters would affect the value of chi-square but that such effects would 
usually be so small that no allowance need be made for them in connection with 
tests of significance. It is now well known that the average value of chi-square 
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is reduced approx niately one unit for each parameter estimated from the sample, 
and that the mam portion of this effect is on the numerators; i.e,, m large samples 
the effect of substituting statistics for parameters m the denominators usually 
has a negligible effect on the value of chi-squaie By confining the discussion 
to the case in which parameteis are used in the denominators, it is possible to 
make simple exact statements concerning the mam effects in terms of the number 
of squares of standardized uncon elated linear functions—also known as the 
number of degrees of freedom and the mean value of chi-square 

When the expected values in the numerators of chi-square can be expressed 
as linear functions of r algebraically independent parameters, ideal linear esti¬ 
mates of the r parameters are determined by substituting statistics for the r 
parameters and minimizing the resulting expression wth respect to each sta¬ 
tistic. In general, such a substitution of statistics for parameters m the numer¬ 
ators of chi-square reduces the number of degrees of freedom by one unit for 
every parameter estimated; that is, the appropriately minimized chi-square 
can be analyzed into k — r squares of standardized uncorrelated linear functions 
of sampling errors 

The )• ideal linear estimates are linear functions of the sample frequencies. 
Let (i>i, , , v,) be a set of standardized uncorrelated linear functions of 

the correlated sampling errors in these statistics and let (tq , i> 2 > •, Vk) be a set, 

of linear functions obtained from the z.’s of section 3 by an orthogonal trans¬ 
formation. Since the sum of squaies is not changed by such a tiansformation, 
chi-square is the sum of the k values of v\. The process of substituting statis¬ 
tics for the r parameters in the numerators of chi-square reduces the values of 
the first nq’s to zero without affecting the values of the other (fc — r)v]’ s. 

Thus the appropriately minimized chi-square can be analyzed into k — r 
squares of standardized uncorrelated linear functions of sampling errors and is 
therefore said to have k — r degrees of freedom. The mean value of each square 
is the variance of a standardized linear function of sampling errors and is there¬ 
fore unity by definition Hence the mean value of the appropriately minimized 
chi-square (with parameters m the denominators) is exactly k — r when r 
statistics are estimated from a set of k + 1 sample frequencies 

The expression to be minimized is 

(51) x 2 {ni ~—' i)2 

np t 

whore is the ideal linear estimate of np l . The set of statistics described 
by the equation 

(5,2) m'i = rii , 

reduces the value of chi-square to zero—its minimum value This shows that 
the sample cell proportion is the ideal linear estimate of the corresponding 
parameter 

Whenever a linear function independent of the sum of the cell proportions is 
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known, it is possible to take advantage of additional information provided by 
the known function by minimizing chi-squaro subject to an appropriate side 
condition When side conditions are used m this way, the number of degrees 
of freedom for the minimized chi-square is equal to the number of side conditions 
which are algebraically independent of each other (and of the sum of the cell 
proportions). Let the known linear function be written 

(5 3) Eu.-np, — m = 0. 

In order to facilitate comparison of the typical equation of maximization 
with the corresponding equation of the method of maximum likelihood, it is 
convement to minimize chi-square by maximizing — x 2 /2 subject to a side 
condition based on (5.3). The function to be maximized can be written 

(5 4) — x 2 /2 = 2(n, - 2 np.) 4 . h(2,a#n[ — m), 

where h is a Lagrange multiplier. Setting the partial derivative of —\/2 
with respect to m' { equal to zero, the typical equation for minimizing chi-square 
can be written 

(5 5) (n, — m[)/np, + ha , = 0, 

a form which shows that, 111 general, ideal linear estimates are defined in terms 
of unknown parameters. Fortunately, these parameters can usually be approxi¬ 
mated closely by an iterative process. Substituting for both np, and m{ 
in equations (5.5) the typical equation m the limiting values of such a process 
can be reduced to 

(5.6) n,/m< — 1 + ha, = 0, 

a form which is identical with the typical equation (6 6 ) of maximum likelihood 
derived in section 6 . This equality of typical equations implies that whenever 
the denominators of chi-square are estimated in such a way as to be consistent 
ivith least squares statistics based on them, the method of minimum chi-square 
always leads (by means of approximations necessary m practice) to maximum 
likelihood estimates of parameters which are linear functions of cell proportions 

6. The method of maximum likelihood. Maximum likelihood estimates of 
linear functions of cell proportions can be obtained by ( 1 ) expressing the prob¬ 
ability function (general term of the multinomial expansion) in terms of the r 
parameters to be estimated; ( 2 ) substituting r statistics for the r parameters; 
and (3) maximizing with respect to the r statistics In practice, this is usually 
accomplished by maximizing the logarithm of the variable factor in step ( 3 ) 
which can be written, 

(fi-1) L = Zn.logm,, 

where m, is the maximum likelihood estimate of np t , the expected value of the 
ith frequency n, in a sample of n observations classified into (fc T 1) classes or 
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cells It is evident that L as written has no maximum with respect to any m, 
since it increases without bound as m, increases, but it sometimes has a uniquely 
determined maximum when each of the m,’s is written explicitly in terms of 
less than k + 1 algebraically independent statistics In the general case it is 
easier to maximize L subject to an appropriate set of side conditions, one of 
which must be equivalent to 

(6 2) mi + + ■ • + m h+1 — n = 0. 

When no linear function except the sum is known, the likelihood function 
can be written 

(6 3 ) L = 2 n, log m, — ( 2 m, — n ), 

a function which, subject to equation (6 2 ), is always equal to that in equation 
(6 1) but which has a uniquely determined maximum The typical equation of 
maximum likelihood, obtained by setting the partial derivative of L with respect 
to m, equal to zero, is 

(6 4 ) ni/m, -1 = 0 , 

an equation which shows that each sample frequency is a maximum likelihood 
estimate of its own expected value 

When a linear function such as that in equation (5 3) is known, an improved 
set of maximum likelihood statistics can be found by maximizing 

(6 5) L = 2 n, log m, — ( 2 m, — n) + 7i(2a,-m, — m ) 

The typical equation of maximization is found to be 

(6.6) njm , — 1 + Tin, = 0, 

an equation which, as stated above, is identical with equation (5.5). Since 
equation (5.5) was obtained as the limit of an iterative process from the typical 
equation (5 4) for minimizing chi-square subject to the same side condition 
and since each additional side condition affects the typical equation of each 
method m exactly the same way, the method of minimum chi-square and the 
method of maximum likelihood are equivalent for the general case m the sense 
that the method of minimum chi-square always leads to maximum likelihood 
statistics as limits of an iterative process 

7. Second-order tables with known expected marginal totals. As stated in 
section 2 , the integration of available techniques is facilitated by' regarding 
maximum likelihood statistics as the • best practical approximations to the 
corresponding ideal linear estimates. Since this important principle may not 
be immediately obvious, it will be illustrated for the important special case of 
|econd-oider tables for which the expected marginal totals are known 

Consider a sample of n observations arranged on two bases of classification 
and presented in a table containing r rows and s columns. The universe of N 
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observations has been completely enumerated and classified on each basis 
separately but not cross-classified; i.e., universe totals of first order classes are 
known. 

For the cell in the ith row and the jth column, let p t) represent the universe 
cell proportion; n<,, the sample frequency; np t j, the expected value of n {j ( 
and , the maximum likelihood estimate of nptj . Indicating summation 
by substituting a dot for the letter over which summation is to be performed, 
the known marginal totals satisfy the equations 

(7.1) ‘Np t . - N t =0, 

Np., ~ N , = 0, 

where pi. and p., are the universe proportions and 77, and N ,■ are the known 
universe totals in the ith row and the jth column, respectively. 

When n observations of a random sample are arranged according to two 
bases of classification in a table with r rows and s columns for which the r + s 
marginal totals are known, the typical equation of maximum likelihood can 
be obtained by maximizing, subject to side conditions based on equations (7.1), 
the likelihood function 

(7.2) L = 22n,^logm,j' — 2a 1 (m 1 . — n,,) — 2— n ,), 

with respect to the maximum likelihood estimates , where a, and bj are typical 
Lagrange multipliers. Setting the partial derivative with respect to m„ equal 
to zero and transposing, the typical equation of maximum likelihood can be 
written 

(7 3) ni,/rruj = a, + b,. 

Since equations (7.3) are not linear in their unknowns, the reader’s first 
reaction might well be to agree with a certain anonymous critic that “their 
solution is difficult.” This impression of great difficulty is probably the chief 
reason that previous writers have not used the method of maximum likelihood 
for this type of problem even after they had developed a set of techniques ade¬ 
quate for the solution of the equations of maximum likelihood. In other words, 
all that was needed was the integration of available techniques as will now 
be shown. 

In 1940, Deming and Stephan [2] derived a set of normal equations for the 
adjustment of a set of second-order cell frequencies to known expected marginal 
totals by the method of least squares in which each sample frequency is weighted 
by its own reciprocal. This method yields statistics which are efficient according 
to the theory of large samples, but they do not satisfy the criterion of maximum 
likelihood exactly. In the same article was presented an easier method of 
iterative proportions, which, unfortunately, does not yield least squares sta¬ 
tistics, In 1942, Stephan [5] developed an improved iterative process which 
yields statistics which satisfy the criterion of least squares with arbitrarily 
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chosen weights. The foregoing developments are presented in greater detail 
in Deming’s book [1] in which Denung adapts Stephan’s iterative method to 
the particular case in which each sample frequency is weighted by its own 
reciprocal so as to yield solutions for the normal equations derived in the joint 
article [2], 

In Deming’s notation, equation 8 of Stephan’s article [5, p. 169] can be written 
(7.4) + qj - 1) + n tj , 

an expression obtained by substituting c t , for np l} m the denominators of chi- 
square and minimizing with respect to the statistics m the numerators. Hence, 
if exact values of the np lJ were used for the c,,, the Stephan iterative method 
would yield ideal linear estimates. Unless these parameters are implied by 
some hypothesis to be tested, it is necessary, in practice, to estimate the 7 ip,„ 
from sample data In order to secure maximum likelihood estimates of expected 
cell frequencies by. means of the Stephan iterative method, the adjusted fre¬ 
quencies based on first approximations to the Cij should be used as second ap¬ 
proximations to the Ci 3 , etc. In this way, maximum likelihood statistics can 
be derived to any desired degree of approximation. At this point it should 
be emphasized that the preceding statement applies not only to the class of 
problems considered in this section but also to the wider class of problems for 
which the Stephan iterative method provides solutions. 

Unfortunately, theoretical discussions of previous writers contain confusing 
compensating errors which (1) present their own methods m an unnecessarily 
unfavorable light and (2) increase the difficulties involved in the introduction 
of the improvements in techniques suggested m section 9 which involve some 
degree of adaptation of techniques already available For these reasons, it 
seems necessary to follow the arguments of previous writers m order to show 
the points at which improvements are needed. This can be done most effec¬ 
tively in connection with Deming’s book [1] where the method of least squares 
is presented m great detail. 

For the special case m which the sampling errors in the observations are un¬ 
correlated, the ideal criterion of least squares implies that the weight of each 
observation should be inversely proportional to its sampling variance This 
criterion is accepted as well known by Doming who says that “the principle of 
least squares requires the minimizing of the sum of the weighted squaies of the 
residuals” [1, p 14] where “the weights of two functions are inversely pro¬ 
portional to their variances” fl, p 22] Doming assumes that “there is no 
correlation between the errors in the observations” with the qualification that 
“this assumption covers a wide class of problems, but does fail to cover some ” 
[1, p 49]. This assumption of uncorrelated errors is not applicable to sample- 
frequency problems, of course, because the sample frequencies are correlated 
with each other in such a way that the reciprocals of the ideal least squares 
weights are not proportional to the sampling variances npi,q l} but rather to 
the expected frequencies np %1 which appear in the denominators of chi-square. 
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In this connection it is interesting to note that Deming himself insists that 
“there is only one principle of least squares, namely, the minimizing of y 2 ” 
[1, p 51] However, the method currently in use for the minimizing of chi-square 
was that given by Fisher [3] which loads to equations which are difficult to solve 
even for such a simple example as the one presented in section 2 above 

Deming and Stephan are to he commended for seeking an easier method 
but there is no justification (even as a device for saving effort) for their modifica¬ 
tion of the "principle of least squares” so as to imply erroneously that 

(1) weights of correlated sample frequencies are inversely proportional to 
their variances, and 

(2) sample frequencies are, m general, approximately proportional to their 
own sampling variances 

Strangely enough, these two errors weie applied in combination by Deming and 
Stephan to obtain good practical approximations to the ideal least squares 
weights. It might be argued that the second misleading implication is leaily 
not an erroi because it is offered as a simplifying approximation, but it is an 
integral part of both the normal equations approach m the joint article [2] 
and Deming’s adaptation [1] of the Stephan iterative method; that is, in each 
case the method would have to be revised if better approximations to the ideal 
least squares weights were used More explicitly, Deming (1) uses n„ for Ste¬ 
phan’s c,j m equation (7.4); (2) identifies it with the other m the same equa¬ 
tion; and (3) reduces the equation to a different form thus effectively preventing 
the use of successive approximations to the c„ without returning to Stephan’s 
iterative method in the general form given by equation (7 4) above which 
Deming does not present at all. Results-of the joint article [2] are quoted by 
Stephan [5] without any explanation of the nature of the errors, but none of 
these results are used in the development of his iterative method which as noted 
above, is applicable to any arbitrarily chosen set of weights The fact that 
Stephan corrected the second error without correcting the first implies that the 
weights he actually used are unsatisfactory In Deming’s adaptation of the 
Stephan iterative method, a much better set of weights is obtained, not by cor¬ 
recting the first offsetting error overlooked by Stephan, but by resurrecting the 
second offsetting error which Stephan had corrected. Since this error is an 
integral part of Deming’s adaptation, Deming’s theoretical discussion implies 
that his own efficient statistics are only rough approximations which are definitely 
inferior to tho inefficient statistics obtained by means of the weights chosen by 
Stephan. These inconsistencies are most clearly brought out by Deming when 
he says: 

“Strictly, m random sampling, the reciprocal of the weight of ft,, is np,,g v < , which is 
nearly equal to ftijg,, where p and q have their usual connotations But since factors pro- 
poitional to the weights may be substituted for them, it is sufficient to use n, : ns tho re¬ 
ciprocal of the weight in cell i], since the values of g,; do not usually vaiy much over the 
table.” [1, p 102 ] 

In any given problem the seriousness of the error in the first statement in 
the foregoing quotation depends on the variation among the q t] ’s. In the par- 
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tieular example used by Deming the error is of considerable importance because 
the largest is more than 40 per cent larger than the smallest q tJ . The weights 
actually used by Deming agree with weights implied by the ideal method of least 
squares except for sampling errors in the n l0 ; hence, the error in any relative 
weight converges stochastically to zero so that Deming’s statistics are efficient 
according to the theory of large samples The efficiency of Deming’s statistics 
is inconsistent with the theory presented by Deming which implies erroneously 
that efficiency of estimation depends on approximate equality of cell proportions. 
If this argument were true it would apply also to the method of maximum 
likelihood and all other methods which yield efficient practical statistics m 
sample-frequency problems. The foregoing discussion, together with the results 
of section 8 show that the theory as presented by Deming has the following 
seriously misleading features: 

(1) it is based on a paradox in which a good final result is obtained by means 
of compensating errors, 

(2) it presents Ins efficient statistics in an unnecessarily unfavorable light, 

(3) it emphasizes the irrelevant condition of approximate equality of universe 
cell proportions, 

(4) it fails to mention the important condition of proportionality by rows 
and columns, and 

(5) it makes least squares, minimum chi-square, and maximum likelihood 
seem to be competing alternative methods. 

Of these undesirable characteristics, the last two are probably the most serious 
because they make the effective integration and adaptation of statistical tech¬ 
niques more difficult. As has been shown m sections 4, 5,. and 6, the sample- 
frequency form of the ideal method of least squares is the method of minimum 
chi-square which always leads (by means of appropriate practical approxima¬ 
tions to unknown weights) to maximum likelihood statistics; in other words, 
the methods are equivalent from a practical point of view 
.Since the ideal method of least squares based on the unknown np l} determines 
fully efficient, but theoretical, ideal linear estimates, the efficiency of practical 
approximations to ideal linear estimates depends on the accuracy with which 
the denominators of chi-square are estimated. For the unknown denominators 
np„ , Deming uses the sample frequencies n l} while the method of maximum 
likelihood implies the use of the corresponding maximum likelihood estimates— 
statistics which, m general, have smaller sampling variances. The foregoing 
argument suggests that maximum likelihood statistics are slightly superior to 
Deming’s statistics for any given finite value of n and that their relative ad¬ 
vantage increases as the sample size decreases In large samples both methods 
yield efficient statistics because the relative errors in the weights implied by 
either method converge stochastically to zero as n increases Although the ad¬ 
vantage of maximum likelihood statistics over Deming’s statistics is unim¬ 
portant except m small samples, it can be shown that Deming’s choice of weights 
leads to imperfectly compensated negative errors of estimation even in his 
large sample of 33,837 observations 
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Deming weights each sample frequency by its own recipi ocal. Positive errors 
of sampling decrease the value of the reciprocal and thus increase the absolute 
size of the required negative adjustments Negative errors of sampling increase 
the value of the reciprocal and thus decrease the size of the positive adjustment. 
Thus every error of sampling (either positive or negative) leads to a negative 
error of estimation due to inappropriate weighting. Because the sum of all 
adjustments must be zero, these negative errors of estimation are compensated 
on the average but more or less imperfectly. The net effect of this imperfect 
compensation of negative errors of estimation is that Deming’s statistics are 
too small in those cells in which the relative adjustments (either positive or 
negative) are large, and vice versa. In a preliminary draft of this article, 
this type of error of estimation was studied by comparing Deming’s statistics 
with the corresponding maximum likelihood statistics in collection with Deming’s 
example involving 33,837 observations Although errois of estimation of the 
type under discussion are apparent, they are, of course, extremely small m such 
a large sample. For this reason the large-sample comparson has been deleted 
in favor of simple hypothetical examples designed to throw light on similar errors 
of estimation in statistics derived by Fisher’s method of minimum chi-square 
as well as in those derived by Deming’s adaptation of Stephan’s iterative 
method 

Consider a set of sample frequencies in a two-by-twn table for which all 
expected marginal totals are equal. For this special case, the cell proportions 
on each diagonal are equal and the ideal linear estimate (which is also the 
maximum likelihood estimate) of any cell proportion is the mean of the two 
sample cell proportions on its diagonal For the same case, Deming’s adaptation 
of the Stephan iterative method yields an estimate for each cell which is pro¬ 
portional to the harmonic mean of sample proportions on its diagonal while 
Fisher’s method of minimum chi-square yields estimates proportional to the 
corresponding quadratic means. 

As a numerical example of the foregoing problem consider the set of fre¬ 
quencies 

(7.5) Tin , flu , tin , n 22 = 1, 4, 3, 2, 

obtained in a sample of 10 observations selected at random from a universe 
in which the cell poportions are known to be 

(7.6) pn , pi 2 , pn , Pn = p, 0 5 - p, 0.5 — p, p. 

As estimates of the parameter p, the ideal linear estimate is .15, Deming's 
adaptation of the Stephan iterative method yields .14, and Fisher’s method of 
minimum chi-square yields 1545 to four decimal places, the other two estimates 
being exact The result' illustrate the imperfectly compensated errors of 
estimation explained piewously. The two sample frequencies on the principal 
diagonal (n n and a 22 ) have greater relative dispersion than the frequencies on 
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the other diagonal. For this reason, the relative adjustments made by Deming’s 
method are greater and according to the principle of imperfectly compensated 
negative errors of estimation, the estimate of p obtained by Demmg’s method 
is smaller than the ideal linear estimate of p, Fisher’s method of minimum 
chi-square yields an estimate of p which is greater than the ideal linear estimate. 
In fact, one should usually expect imperfectly compensated errors of estimation 
in statistics derived by Fisher’s method of minimum chi-square to be opposite m 
sign and about half as large as those in the corresponding statistics derived by 
means of Deming’s adaptation of the Stephan iterative method. 

At this point, it should be emphasized that Fisher does not recommend his 
own method of minimum chi-square in preference to the method of maximum 
likelihood. In fact, he presents the theoiy of estimation m such a way as to 
imply conectly that the method of maximum likelihood is superior, especially 
in small samples. Other writers have noted the small differences between 
equations of maximum likelihood and those for minimizing chi-square by Fisher’s 
method and some have even derived one set of equations from the other by 
neglecting higher order terms in a Taylor senes expansion These derivations 
are of no inteiest here because they seem to justify the method of maximum 
likelihood as a simple approximation to some more complicated method. This 
type of justification is both unnecessary and undesirable It is more useful to 
regard the method of maximum likelihood as an approximation to a method— 
least squares—tor which the theory is simpler. 

Skeptical readers who find the foregoing argument unconvincing may be able 
to profit from the following example Consider the problem of estimating the 
parameter p where 2 p is the proportion of white balls in an urn. A sample of 10 
balls is selected and classified by the following process Each white ball is 
placed m one of the cells on the principal diagonal of a two-by-two table, the 
particular cell being decided by the toss of a coin A similar method is used for 
non-white balls placed in cells on the other diagonal Assuming that the results 
of this process are given by equation (7.5), which of the three alternative esti¬ 
mates of p given above should be preferred? Belief in the general superiority 
of Fisher’s method of minimum chi-square seems to imply that the device of 
coin-tossing described m this example can be used m practical problems involving 
the estimation of the proportion of "successes” to secure estimates which are 
superior to the sample proportion—the ideal linear estimate in such cases. 
Even if it is possible to construct trivial special case examples supporting some 
complicated method for such problems the general use in practical problems of 
the coin-tossing device in connection with either Fisher’s method of minimum 
chi-square or Deming’s adaptation of the Stephan iterative method would be 
absurd as this example is intended to emphasize. 

8. The method of proportional distribution of marginal adjustments. The 

method of proportional distribution of marginal adjustments is a general method 
of adjusting sample frequencies so that their row and column totals agree with 
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known expected marginal totals. In other words, the adjusted frequency for 
the cell in the *th row and the jiti column is given by the equation 


(8.1) 

m* = n i3 — pid , — p 

where 

d{, = «i. — rit., 

and 

d.j - m., — n.j, 


are the net adjustments in the sample cell frequencies of the ith row and the 
jtb. column, respectively. The asterisk is used to distinguish maximum likeli¬ 
hood estimates m t] and the ideal linear estimates m[, from the set of statistics 
based on equation (8 1). 

The method of proportional distribution of marginal adjustments yields ideal 
linear estimates when the universe cell proportions are proportional by rows and 
by columns, i.e , when 

(8 2) pij = p,.p.,. 

This important principle can be established by substituting in equation (7.4) 
of section 7 the quantities 

(8 3) c„ = np t .p ,, 

p . = 0.5 -f d, /np, , 

and 


q, = 0.5 + d.j/np j , 

and reducing the typical equation of the ideal method of minimum chi-square 
to the form of equation (8.1) which defines the method of proportional dis¬ 
tribution of marginal adjustments. 

Even in the absence of exact proportionality, under which it yields fully 
efficient statistics, the method of proportional distribution of marginal adjust¬ 
ments has the following relative advantages over other available methods: 

(1) ease of extension to tables of higher order; 

(2) exact agreement with known (expected) marginal totals; 

(3) simplicity of interpretation; 

(4) independence of computational errors; 

(5) rapidity of processing, 

(6) economy of effort; and 

(7) fully efficient criteria for testing the significance of departures from 
proportionality of rows and columns 

Ease of extension to tables of higher order is a desirable property of ;the 
method of proportional distribution of marginal adjustments. Equation (8.1) 
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applies to the special case in which there are only two bases of classification. 
In the more general case sample observations are cross-classified according to 
r bases of classification, each cell frequency m an rth order table being the num¬ 
ber of observations in the corresponding rth order class whose expected value 
is to be estimated The required adjustment for each first order class (obtained 
by subtracting the sample total fiom its known expected value) is distributed 
among the vanous cells in proportion to the universe totals of the corresponding 
(r — l)th order classes to which the cells belong. The general process is il¬ 
lustrated by 

(8 4) m*, k = n l]k + p r ,d, ]k + p.,d t K + p , 

the formula for estimating the expected frequency in the general cell of a third 
order table , 

Exact agreement with marginal totals follows easily fiom the method of 
proportional distribution and can be established algebraically by summing the 
estimation equation by first order classes; e g., summing equation (8.1) byrows 
and columns In practice, discrepancies are always either errors of loundmg 
or mistakes in computation, they are never due to lack of convergence of iterative 
processes as is often true m alternative methods of estimation 

Although simplicity of interpretation is desirable in general, it is especially 
important when random sampling is an unrealistic abstraction For example, 
the method of proportional distribution of marginal adjustments has been used 
to estimate the cell proportions in a two-way classification of incomes from known 
marginal proportions and a detailed cross classification at an earlier date In 
this problem known shifts in income distributions made it evident that certain 
cells previously vacant should not have the zero proportions which would be 
estimated for them by other available methods of estimation The ease with 
which the effects of the method of adjustment can be traced is important also 
m the analysis of the results of sample surveys in which various types of bias 
are important 

The method of proportional distribution of marginal adjustments yields the 
estimated expected frequency for any cell by a single sequence of computations 
which is independent of the corresponding process for any other cell Errors 
made in computing the estimate for any cell appear in marginal totals of esti¬ 
mates for all first order classes which include that cell. If only a few errors are 
made in a table they can be localized immediately and can be corrected without 
recomputing any estimates which are correct 

In certain types of social surveys, rapidity of processing is so important that, 
as Demmg puts it, “the delay of only the brief time requiicd for adjustment 
may not be advisable ” [1, p. 102], Under these conditions, it is important to 
have a simple formula like equation (8 1) m which substitutions can be made 
rapidly Even when the time element is relatively unimportant, the economy 
of effort and the ease of explaining the method to clerical assistants are often 
of practical importance. 
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Finally, departures from proportionality among rows and columns often 
provide the chief element of interest in research studies—not only m social 
surveys of the type illustrated in Deming and Stephan’s example but also in 
biological sciences. The most effective tests of significance for the purpose of 
presenting statistical evidence of lack of proportionality are those based on 
statistics like those derived by the method of proportional distribution of marginal 
adjustments whose efficiency is 100 per cent when proportionality is exact. 

Even when proportionality is not exact, the efficiency of statistics derived 
by proportional distribution may be close to 100 per cent under fairly typical 
problem conditions such as those in the example by Deming and Stephan wherein 
the other more complicated methods require several times as much computational 
effort, but have little advantage over the easier method with respect to effi¬ 
ciency of estimation in this particular problem. 

9. Suggested improvements in techniques. In section 7, a method was 
outlined by which it is possible to derive sets of maximum likelihood statistics 
by merely integrating available techniques without changing any of them 
In this section a number of improvements are suggested. At this point it should 
be emphasized that a given change is not an improvement merely because it 
yields slightly more accurate estimates or makes possible a slight saving of 
time and effort. In each ease the research worker should consider saving of time 
and effort and accuracy of estimation simultaneously In particular, it seems 
likely that most social surveys of the type considered by Deming and Stephan 
are characterized by approximate proportionality by rows and by columns— 
conditions relatively favorable to the simple method of proportional distribu¬ 
tion of marginal adjustments. It should be clearly understood that sug¬ 
gestions in this section are intended for those research workers whose problems 
justify a great deal more effort than is required to adjust sample frequencies 
by this simple method. 

Assuming that the problem at hand warrants the effort required to derive 
maximum likelihood estimates, the first consideration is the derivation of a 
set of first approximations to the rn tj , and a set of values of p,(l), 

first approximations to the p,. Even if proportionality by rows and by columns 
is not closely approximated use of values of the p,’(l) provided by equation (8 3) 
are especially to be recommended. In the example used by Deming these 
values for the p,(l) are so much better than the values recommended by Deming 
that they save a large proportion of the effort required by the iterative process. 
If rows and columns are approximately proportional, equation (8.1) should be 
used to provide values of the 1), in which case it is possible to use an itera¬ 
tive process similar to the one used by Deming but based on the typical equa¬ 
tion of maximum likelihood (7.3) to achieve a given degree of accuracy in the 
maximum likehhood estimates with even less effort Under f avorable conditions 
such as those in Deming’s example the suggested iterative process yields excellent 
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approximations to maximum likelihood estimates by means of the following 
steps • 

1. Constiuct a set of first approximations to the r row components of the rs 
maximum likelihood divisors (a t -\- b } ) by means of the equation 

(9.1) «i(l) = n l ./np l - 1/2 

2 Compute successive approximations to the a t and b, by means of the equa¬ 
tions 

(9.2) bjig) = [n j - 1,m ll (l)a l {,g)]/np, ] , 

(9.3) a,{g + 1) = [n, - ^m l] {l)b,{g)]/np l , 

where ?n u (l), the first approximation to m,,, is derived by means of equation 
(8 1). Just as in Demmg’s iterative process, the expiession in brackets is a 
series of products which can be subtracted m a single sequence of machine 
operations and the final division can be performed without having to record 
any of the intermediate results. 

3. Divide the sample frequencies by the maximum likelihood divisors to obtain 
the maximum likelihood estimates 

(9.4) m tl = n;,/(a, + bj), 

where limiting values of a; and b, are approximated as closely as desired by 
successive approximations in the preceding equations. 

Under unfavorable conditions, the iterative process of this section is not 
always the easiest way to obtain satisfactory estimates. For example, when 
samples are small and/oi rows and columns are not approximately proportional, 
it is better to use the iterative method as originally presented by Stephan where 
sample frequencies can be used for first approximations to the c t j and these may 
be replaced by successively better approximations. 

The point made in the final paragraph of Fisher’s well-known book [3] that 
“in practice one need seldom do more than solve, at least to a good approxima¬ 
tion, the equation of maximum likelihood,” is strongly supported by the develop¬ 
ments of this article. In addition, the proof that the method of least squares 
and the method of minimum chi-square always lead (by means of approxima¬ 
tions to ideal weights) to maximum likelihood statistics greatly facilitates the 
adaptation of techniques developed in connection with these hitherto competing 
methods 
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A STATISTICAL PROBLEM CONNECTED WITH THE COUNTING OF 
RADIOACTIVE PARTICLES 

By Sten Malmqutst 

Institute of Statistics, University of Upsala , Sweden 

1 . Introduction. Our problem refers to random events forming a sequence 
m time or in space, e g. particles emitted by a radioactive matter. By omitting 
certain elements of the given sequence, say /, -we form another sequence, say g. 
The rule of omission involves an arbitrarily prescribed constant u The rule 
to he followed in forming g is: 

Case I : Let a be an element in / and g The next element to be included 
m <7 is then the first element m / which follows a after a distance greater than u. 

Case II. Let a be an element m f and g. The next element to be included m 
g is then the first element in / which follows a at a distance greater than u from 
the preceding element m /, whether this belongs to g or not 

When the events are represented by impulses emitted by a radioactive matter 
and feeding a recorder with a constant resolving time u, the new sequence con¬ 
sists of the counted impulses. The two cases correspond to the reaction of 
different types of recorders The distinction between the two transformations 
has caused some confusion It has, however, been clearly pointed out by 
Ruark and Brammer [5], 

v. Bortkiewicz [2] seems to be the first who has considered problems related 
to the transformed sequence Starting from investigations by Rutherford, 
Geiger, and others, concerning the number of recorded a-particles during a 
certain interval of time, say T, he observed that the distribution of this number 
was similar to that of Poisson but with a slightly smaller dispersion. This fact 
he supposed to be caused by a constant resolving time u of the recorder By 
means of certain assumptions he tried to calculate the effect on the mean and 
the dispersion by the transformation m Case I, supposing the cumulative dis¬ 
tribution function F{t) for the distance between two consecutive elements in 
the sequence / is given by 

F(L) = 1 - e~“, 

where here and in what follows, £ denotes a non-negative variable. 

Considering Case II with F{t) as above, Levert and Scheen [4] have recently 
worked out an expression for the distribution of the number of elements during 
T in the sequence g. 

Gnedenko [3] has considered the distribution of the number of lost elements 
in Case I with particular regard to the initial state of rest. 

Alaoglu and Smith [1] considered problems referring to successive trans¬ 
formations of a sequence. When, for example, a sequence of particles enters 
a tube-counter and amplifier, together acting with a resolving time Ui , and 

266 
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the impulses then are feeding a recorder with resolving time u 2 > iq, the se¬ 
quence of lecorded impulses will be the result of two successive transformations. 
If we have a scaling circuit between the counter and the recorder, we have to 
make a transformation of another type between the two transformations m 
Case I and Case II. 

The present paper deals with the transformed sequence m Case I The 
distribution function F(t) is supposed to be arbitrary. A 11 advantage of this 
generalization is that the formulas derived could be used in treating problems 
referring to successive transformations. 

The author wishes to express his sincere gratitude to Professor Herman Wold 
for stimulating discussions and valuable advice 


2. Derivation of distributions for case I. Suppose that the sequence / 
has F{t ) for distribution function for the distance between two consecutive 
elements. F(l) is supposed to he independent of absolute time (space), and of 
the preceding distance between two elements When not stated otherwise, 
we further suppose F( 0) = 0, 

Now let G(t) be the distribution function for the distance between two con¬ 
secutive elements in the transformed sequence g. Evidently G(t) also is inde¬ 
pendent of absolute time and of the preceding distance between two elements. 

We shall consider certain distribution functions connected with F(t ), These 
functions will then be used in solving problems concerning the sequence g 

Let F n (t) be the distribution function for the distance between the first and 
the last of n + 1 consecutive elements in the sequence / Then F„(t) is given 
by the recursive system 


( 1 ) 


F m+n (t) = [‘ F m (t - x) dF„{x)-, 
J 0 

Fi{t) - F{t) 


As is easily seen, we have 


(m, 1 ) 


F m+n (t) < F m (t)-F n (t); 

and, for t = u, 

F„(u) —> 0, as n —> °o ; 

OO 

13 F„(u) < <x>, provided that Fi(0) < 1. 

Ural 

Alternatively, F n (t) could be deduced by the use of characteristic functions. 
Still considering the sequence f, let <b(t) be the distribution function for the 
distance d between an aibitrarily chosen point and the following element. 
Suppose that the arbitrary point is chosen so that the distance between the pre- 
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ceding and the following element is x. Under this condition we have, in usual 
symbols, 


P(d > t) = 


x — t 


Hence, 


HO = 1 - / 


x — t 


dH{x) 


where 77(f) is the distribution function for the distance x. 

To deduce 77(f) we suppose that the distribution F(t) has a finite mean, 

m = f t dF(t). 

Jo 

By the definition of 77(f), we then have 

77(a) = i f fdF(f). 

171 J 0 

Thus 

(2) HO = ^[j[ xdF(x) + lf t <7F(x)J. 

The corresponding frequency function y>(f) is given by 

1 - F(f) 


<p(0 = 


m 


Consider n + 2 consecutive elements in /, say a 0 , ci, • • ■ , a„ + i, where o 0 
is an element in the transformed sequence g. The probability P„ that the 
next element in g following a 0 will be a n+i is given by 

P n = F n (u) — F n+ i(u), (n = 1, 2, •-■), 

Po = 1 - F(u). 

Now let PJt) be the probabihty that the distance between a 0 and a„ +1 is 
smaller than or equal to "f, when a o an o n+ i are two consecutive elements in the 
sequence g. Then 

Fn(f> = FJu, - l [ F(i _ ~ F, ' u ~ *'] dF ’ (x 'l’ 

Let G*(t ) be defined by 

G*(i) = Z P„ • P„(f) = P(f) - F(u) 

r *=0 

+ Z f [F(t ~ x) — F(u - x)] dF n {x)\ t > 

n»l 


u . 
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When G*(t) is a distribution function, then G*(t) equals G(l). 

For ti < k, we obviously have G*(U) < G*(ti). 

For L - oo 

G*( 00 ) = 1 - F(u) +f f [ 1 - Fill - *)] dF„(x) 

ii-l ^0 

□0 Op 

= 1 - F(li) + £/'’„(w) - E^n+l(w) = 1. 

1 1 

Hence we take 


(4) 

G(t ) = <?*(0, 

i > M 

) 


G(t) = 0; 

t < li 

When the corresponding frequency functions g(t) and f(t) 

exist, we get 

(5) 

g{t) = fit) + it [ f(t - *)/«(*) dX) 

n-1 ^0 

t > U. 

Dealing with a sequence of elements we are often concerned with the number 
of occurrences during a certain time T. 

Let the mean number of occurrences during T be M(T). Supposing that 

the mean m — 

[ t dF(t) is finite and that F(0) < 1, we have 

Jo 


(6) 

M(T) = T/m. 


We define 

F(t) 

™ = 0 

for f > 

— e 

for f < { 


F(i) 

F(e) 

for t > e 

for t < € 

and denote the corresponding means by Mi(T ) and M 2 (T). 

As is easily seen, 


ilfi(e) < < M 2 (e). 



Using (2), 


Mi(*) = 


+ «[i - ^(e)] 


[ x clKi(x) [ x dKi(x) 

Jo Jo 


*i(«) = 


/ x dKt(x) 
Jo 


[ 1-«[1 - F(e)f + •■■ + n-e[ 1 - F(t)f F(e) n ~ l + 


/ x dKi{x) 

Jo 
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Making N = T/& and summing, we obtaih 

T T 


Mi(T) = 

WUT) = 


[ X dF(x ) + eF(e) ’ 

Jo 


/ x dKx(x ) r, 

Jo 

T _ __ 

/ xdKi{x) m— / xdF(x) 

Jo Jo 


T 


By choosing e arbitrarily small, we get 

M{T) -> T/m. 

Let P(ti, T) be the probability that we get n elements in / during a time T 
Suppose that the first of these elements, a x , comes at To + x, and the last, 
a„ , at To + * 4" V- 
We then have 

/» T n T—x 

(7) P(n, T ) = I .p(a;) cte I [1 - F(T — a; — 3 /)] dF n ^(y). 

J 0 */0 


In (4) and (7) we have equations foi the transformation in Case I, Because 
of the general form of F(t ), the formulas also can be used when we are concerned 
with successive transformations. It can further be remarked that the trans¬ 
formation of a sequence of impulses by passing a scaling circuit is expressed by 
the system ( 1 ). 


3. Results for a particular form for F ( t )• The preceding formulas will 
now be used for a special distribution function F(i). Suppose that the fre¬ 
quency function J(t) = dF{t)Jdl is equal to the frequency function of the dis¬ 
tance between an arbitrary point and the following element. 

From (3) we get 


F'(t) 


1 - Fit) 


or, when F( 0) = 0, 


(8) 

F(t) = 1 - e~ al ; 



(9) 

J(i) = ae~ at . 

wheie 1 /a = 

m = f tf(t) dt. 

Jo 

By means 

of the theory of characteristic 

functions we 

have 

(10) 

m = ^ 

■ 113: dx- 

hit) = /(«); 

where 




(11) 

rj(x) = a f 6~ at e tz dt = 
Jo 

a 


a — ix' 
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" 0'5 £ 


(a — nr) n 


Thus 

( 12 ) 

For n — 1, we get 

a® i£-_i 


cfr 




e" li * da: 


By diff erentiating (13) n — 1 times with respect to a we obtain 

Mrv aI = ^(-irv - i)i £ 

Hence, from (12), 

(14) /•(*) = 


(a — ia:) n 


dx. 


,n—l —al 

t e 


(n - 1)1 

From (5) we obtain the fiequency function for the transformed sequence 


(15) 


g(t) = ae~°‘ + E f 

rv»l ^0 




(n - 1)1 
G(2) = 0, t < u. 


r 1 dx = ao au e~ 


t > u 


The mean m, is given by 

r 

m, = a 

Ju 


te au e~ al dt — - + u. 


Remark: Suppose the constant u is allowed to vary independently of t and 
that the frequency function of u is y(u ), we obtain 


(16) 


m„ = [ t dt I g(u, t)y(u) du = f - y(u) du + f uy(u) du 
Jo Jo Jo d Jo 


= - 4- m(u) 
d 


Now let the sequence of elements, g, by means of (5) be transformed into a 
new sequence, h. When we are concerned with the counting of particles, 
emitted from a radioactive matter, let the sequence g consist of impulses from 
a counter-amplifier with resolving time u, feeding a recorder with resolving 
time Ui. Then the elements in h are the counted impulses, it being supposed 
that the tube-counter and the recorder reacts according to the assumptions. 
We suppose Ui > u. When u x < u, the sequences g and h are identical 
Let g n (t) denoto the frequency function of the distance between the first and 
the last of n + 1 consecutive elements in g We find, in the same way as 
used in obtaining (14), 


(17) 


ff »(0 = 


(« - D! 


fl ow, (i - nu) n ~ l e- 


t > nu. 
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Let Ml) be the frequency function for the distance between two consecutive 
elements m the sequence h. Let further N be the greatest integer smaller 
than or equal to ui/u. 

Using (4) and (5) we obtain 


h,(t) = ae a, V“ £ (in - nuf a anu - 
o nl 

K n 
-at a 


t > U\ + U) 


hrn(t ) = ae““ e~ al £ [i - (» + 1 )u]V 
on! 


(18) hu(t) = ae au e~ at £ jL [t - (n + l)u] n c anu , (N + l)u < t < Ul + u; 

Ui < t < (JV + l)w. 

= + «1 ["1 + £ £ ^ ~ 7 )V 

_1 L. n«*l V— n V ! 

f ifer(i) 


The mean is found to be 
(19) win 

We also have 

[ thi(t) dt < m h < 


dt 


or 


[- + mi + «1 PE («i - »«)" 

|_a J |_ o n 1 


< m h < ~ + m e au [E ^ (m - mt)'V a(ui “"“ ) "| 
L« J L o »l J 


We now consider the number of occurrences during a time interval T. Using 
(6), (16), and (19) we immediately got the mean numbers of occurrences during T 
By (3), we get for the sequence g 


( 20 ) 


<Po(0 — 


au + 1 ’ 


l < u 


& au —at 

e e ; 


t > u. 


au 4- 1 

Inseiting (20), (15) and (14) in (7) and evaluating the integrals, we finally get 


(21) P f (», T) - 


®n-i — 2 a n + a7i+i; 


a,,_i — 2a„ + (n + 1) — 


aT 


au + 1 


T 

n < - - 1 
u 

t i ^ ^ r 

-1 < n < - 

w u 


oT a!F , , , „ aT 

«n -1 - 2 n-— + (n + 1)-— ; 

L aw + 1_ au + 1 


7 1 T 

- < n < - + 1. 
w w 
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where 


a„ = —e- a(r - ,lu) E ^(n - o), (n = 0, 
au + 1 „=n »! 


(22) " au + 

a_i = 0 

When u = 0, wc obtain 


' ')) 


» n ju » 

—aT 1 Qj f \ 

a n = e 2^ —r ~ V- 

v ! 


For the sequence / we then get the Poisson distribution 
(23) 


P/(», 7’) = 


The corresponding expression for the sequence h is much more complicated- 


4. A statistical experiment. The following statistical experiment will servo 
as an illustration of the scheme dealt with in this paper—the transformation of 
a sequence and the resulting formulas, especially ( 21 ) 

Groups of five figures, the last rounded up if necessary, have been extracted 
from tables of random sampling numbers (0) Let each group denote the fust 
five digLts for a decimal a;, arbitrarily chosen between 0 and 1 . The variable 
x is supposed to have the distribution function t for 0 < l < 1 We now define 
a new variable, y, given by 

(24) y = -/clog(l - x), [or y = —/olog.rj. 

The variable y has the distribution function given by ( 8 ), viz. 

F(t) = 1 — g~°\ where - = in = /clog c. 
a 

Transforming each group, or number x, according to (24), we get a sample of 
consecutive distances between elements in the sequence / considered in the 
previous sections Choosing a constant u, we can construct the corresponding 
sequence g. Beginning with a point, arbitrarily chosen on the first distance, 
we can finally count the number of elements in successive intervals of the same 
length. 

Take k — 1, u — 0.2 and T = 1.5 Wc then have for the sequences / and g: 
mi — - = log e = 0.4343; m„ = - + u = 0.6343; 

(X Q, 

<Tf = - = 0.4343; = - = 0.4343; 

a a 

Mf(T) = — = 3 454. M a (T) = — = 2 365 

m, m„ 
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The experiment yielded the following, results. 

For the sequence /: For the sequence g • 

Number of elements 801 Number of elements 555. 

rUf = 0 450. frig = 0 648 

In neither case is the deviation between the observed and theoietieal means 
statistically significant In fact wc have: 

(mj^-jn/WsOO ^ x Q _ (m„ - m„)V554 _ Q 

<*<! 

which gives P = 0 3 and P = 0 4, respectively. 


TABLE I 

Nos of mlervals with n elements 


n 

Sequence/ 

Sequence g 

Observed 

Expected 
according 
to (23) 

Observed 

Expected 
according 
to (21) 

Expected 
according 
to (23) 

0 

6 • 

7.6 

5 

8 2 

23.7 

1 

33 

26 1 

53 

42 5 

54.8 

2 

48 

45.1 

82 

81 8 

63.3 

3 

55 

51 9 

69 

72 2 

48.8 

4 

36 

44.8 

23 

29 2) 

28.1 

5 

32 

31.0 

6 

4.81 

13.0 

6 

17 

17.8 

1 

0 2) 

5.0) 

7- 

12 

14 7 



2.4/ 

2 

239 

239 

239 

238.9 

239 

Mean 

3.331 

3.454 

2.310 

2.36 

2.31 

X 2 


4 825 


4.524 

36.7 

p 


0.68 


0.34 

<0.001 


The functions a n in (22) can be calculated by means of Pearson’s tables of 
the incomplete -/-function (7). In the notation of these tables we obtain 

Hence 


n 


n \ n n — A 
au 1 nl ~ au -f- 1 


[1 - I(p, ?)], 


d- 
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where 


1 = a(T - mi), ;> = ; 


q = n- 2. 


In the present case, however, we only need the numbers up to at . Accordingly 
the a n have been calculated directly 

The resulting theoretical and obseived distiibutions lor the number of ele¬ 
ments during T for the sequences / and g will be found in Table I, For com¬ 
parison, a Poisson distribution, with the same mean as observed for the sequence 
§i is given, The result of a x test is also shown in Table I Judged by the x 
test the distributions (23) and (21) agree fairly well with the observed distri¬ 
butions, As was to be expected, the Poisson distribution cannot be used for 
the sequence g 
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the probability function of the product of two normally 

DISTRIBUTED VARIABLES 1 
By Leo A. Aroian 
Hunter College 

1. Introduction and summary. Lot x and y follow a normal bivariate prob¬ 
ability function wiLh means X, Y, standard deviations <ri, cr 2 , respectively, r 
the coefficient of correlation, and pi = X/oi , p 2 = YJc r s . Piofessor C 0. 
Craig [1] has found the probability function of z = xy/a x a % in closed form as 
the difference of two integrals For' purposes of numerical computation lie has 
expanded this result m an infinite series involving powers of z, pi, pa, and Bessel 
functions of a certain type, m addition, he has determined the moments, semm- 
vanants, and the moment generating function of s However for pi and p 2 
large, as Craig points out, the senes expansion converges veiy slowly. Even 
for pi and p 2 as small as 2, the expansion is unwieldy We shall show that as 
Pi and p a —> «, the probability function of z approaches a normal curve and m 
case r = 0 the Type III function and the Gram-Charlier Type A series are excel¬ 
lent approximations to the z distribution in the proper legion. Numerical in¬ 
tegration provides a substitute for the infinite series wherever the exact values of 
the probability function of z are needed Some extensions of the mam theorem 
aie given in section 5 and a practical problem involving the probability function 
of z is solved 


2. Theorems on approach to normality. The moment generating function 
of z, M z {6), is [1] 

(pi + pi — 2rpip 2 )0 2 + 2pip 2 0 

( 21 ) M. m - . 

V[1 - (1 + mil + (1 - r)0] 

Let 2 , and <r z be the mean and the standard deviation of g, and 4 = (2 — 3)/o> ■ 
Now 


(2 2) 2 = pip 2 + r, = Vp! + p 2 + 2rpip 2 + 1 + r 2 . 

Using (2.2) we find in the usual way the moment generating function of 4 

—2na + (pi + p 2 + 2rpip 2 )w 2 -f 4rW — 2 w 3 (r 2 — l)(pi pa + r ) 
(2 3 ) M = ^_ 2[I - (1 + r)iti][l + (1 - r)w] _ 

V[1 - (1 + nM|i + 0 - r)w\ 

where w = O/a, . 


1 Presented to the American Mathematical Society, Oct. 28, 1944, New York City. 

265 



266 


LEO A. AROIAN 


Consider rg 0 Then in the limit as pi and p 2 —> “ in any manner vvhatevei, 
(2.4) lim M,M = e v \ 

Pl>P2~* m 

and by the theorem of Curtiss [2] on moment generating functions we see in 
the limit as pi, p 2 —*■ °o the probability function of z approaches a normal curve 
With mean, z, and variance tft , r ^ 0. 

In case — 1 -f e < r < 0, e > 0, some care is required wherever 

V Pi + Pi + 2pi p 2 r 

occurs. If one uses pi + p\ Sr 2p,p 2 , the proof goes forward quite readily. 
Hence we have proved the theorem - 

Theorem (2 5). The distribution of z • approaches normality with mean z , 
and variance c 2 z as pi and p% —> °o in any manner whatever, — 1 + e < r ^ 1, 
e > 0 

It is evident in Theorem (2.5) we may allow pi, p 2 —> — without any other 
changes Theorems (2 6) and (2 7) are proved in essentially the same way 
as (2 5). 

Theorem (2.6). The distribution of z approaches normality with mean z, 

and variance o \, if pi —> «, p 2 > — m s — ] A ?■ < 1 — e, e > 0. 

Theorem (2.7). 77ie distribution of z approaches normality if pi remains 

constant p 2 —> co, — l + < < ?• g I, e > 0; or if pi remains constant p 2 —> — co ( 

— 1 r < 1 — e, £ > 0. 

Naturally in any of the theoiems pi and p 2 may be interchanged, In practice 
pi and pi are usually positive. The approach to normality is more rapid if 
both pi and p 2 have the same sign as r. 


3. Numerical values. In order to show how closely the Type ill and the 
Gram-Charlier Type A series approximate the probability function of z, f(z), 
or more precisely /(z, pi, p 2 , r), we use numerical integration where 


fip, Pi, Pi, r) = Ii{p) ~ hip), 


Ii(z) = 2whrJ 0 exp ~ 

(3.1) 



and 7 2 (z) is the integral of the same /unction over (— 0), [1]. Now Ii(z) 

may he written as 


aw = jf v(f09(k)m^. 


fit) = 


e -OT» 

X^2ir 



Pits) = e‘\ U = rhk. 


(3.2) 

where 



PROBABILITY FUNCTION OF A PRODUCT 


267 


We readily obtain h(z) V1 — r 1 by forming the product of <p(h) } ^{Q, p(t 3 ), 
and 1 /a, using numerical integration applying Weddle’s formula, the Gregory- 
Newton formula, or the simple rectangular formula depending on circumstances 
The rectangular formula [3] is remarkably accurate when the function T = 
111 the interval 0 to co or 0 to ~ m jg somewhat symmetneal. 
Appropriate tables for v (u), <p(U) (see [4]), /3(4) (see [5]) and 1 /x (see [ 6 ]) are 
readily available, In the important case of the independence of x and y, r = 0 
and (3 2) becomes 

<p(ti)<p(k) —k = X — pj, = p 2 — - . 

X X 


4. Approximations to f(z). Wlien r = 0, the standard seminvariants £ 3 , 
and £4 of z are 


(4.1) 


& 


6 pi P 2 


t -- ~b Pg) d~ 1) 

' ;x ’ (p? + p! + 1) 2 


(p? + p 2 2 + 1) 3 ' 2 ’ 

remembering 

2 = P 1 P 2 , Ca = VPl -(“ P2 d - 1 . 
In the Pearson system (see [7]) 5, the criterion, is 


(4.2) 


5 = 


2 & - 


6 + 

and for the probability function of z 

(4 3 ) 5 _ 2 (pi + p 2 + l){ 2 (pl + pl) + 1 } — 18 p 2 p 2 

(pi 4~ p 2 + l)t(pi + p 2 + l) 2 + 2 (p{ + pl) 1] 
and if pi = p 2 = p 


(4.4) 


g __ 2(4p 2 + l)(2p 2 d~ 1) — 18p 4 

(2 p 2 + l)[(2p s + l) 2 + (4p 2 + 1)] 


Now 5 = 0, £3 ^ 0, for the Type III function, and clearly lim 5 = 0. 

By use of (3 3) the accurate values of J(i) have been calculated for various com¬ 
binations of pi and p 2 and compared with the Type III approximation using 2 , 
o"« , £3 • 

(4.5) Investigations so far completed show that for pi ^ 4 and p 2 Sr 4 simul¬ 
taneously, and | 5 | ^ .008, the Type III approximation will provide values 
of t t correct to three significant figures at least where 


(4.6) 




.05 S a 5= .005, 


These are the values of t z which would be needed in testing hypotheses The 
exact values of t'f and for lf ] for various values of pi and p 2 less than 4 null be 
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determined it is hoped in the future and will be published along with the com¬ 
parisons of the Type III values of t z with the accurate values of t, in the im¬ 
portant borderline cases of pi = pi = 2, and pi = pi = 3. The values of /(z) 
for pi = p 2 = 2 and pi = p 2 = 4 have been calculated but these arc being with¬ 
held for a more complete table. The table of values of z, a z , £ a , £ 4 , and 5 
(Table II) shows then that the Type III function is excellent along a band about 
pi = p2 , since £ 3 7^ 0 , and S is very small. 

We use the Gram-Charlier Type A senes of three terms to approximate the 
probability function of z in 4 units 

(4 7) m ~ <p(q - |f ^ tS) w + |f ^ ( 0 , 

in the usual notation. 


TABLE I 


4 

f(l,) Correct, value 

Normal Curve 

Gram-Charlier 
Type A 

.9950372 

.2406367 

.2431716 

.2408235 

1 4925558 

.1275209 

.130970 

.127484 

1.9900744 

.0538243 

.0550708 

.053704 

2.4875930 

.0184006 

.0180791 

.0184500 

2 985111G 

0052477 

.0046338 

.0052944 

3.4826302 

.0012009 

.0009272 

.0012804 

3.9801488 

.0002611 

.0001449 

.000260 

4.4776674 

.0000467 

.0000177 

.0000425 

4.9751860 

.00000745 

.00000108 

00000555 


(4.8) For | £ 3 1 < .5 and £ 4 < 4 simultaneously the Gram-Charlier Type A 
series is quite adequate for finding probability levels such as those of (4.6). 
These will in general give 3 significant figures for or l ( /‘ In the special case 
pi = 0, p 2 = 10, the Gram-Charlier Type A series differs from /(<«) very slightly 
in the range 1 ^ | 4 | < 00 (see Table 1). Naturally the Gram-Charlier will 
be used wherever Type III is not indicated, although there exist some over¬ 
lapping regions where either one may be used It should be noticed that the 
approach of j{z) to normality is more rapid along a row than down a diagonal. 
In case either pi or p 2 is negative, we may make use of the equation 

(4.9) /(z, -pi , P 2 , r) = /(-z, pi , p z , - r). 

We note that when r = 0, /(z, pi, p 2 ) always possesses a discontinuity at z = 0, 
(see [1]). A table of z, a z , £ 3 , £ 4 , and S is provided for values of pi and p 2 from 
0 to 10 inclusive. 
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TABLE IP 


\ 

\ PI 

P2 \ 

2 

4 

G 

8 

10 


0 

0 

0 

0 

0 


2.236068 

4.123106 

0.082762 

8.062258 

10.049876 

0 

0 

0 

0 

0 

0 


2.160 

.085121 

.319942 

.183195 

.118224 


.529 

.205 

.101 

.059 

.039 


4 

8 

12. 

10. 

20. 


3 

4.582576 

0.403124 

8.306624 

10 246951 

2 

.8 

.498784 

.274250 

.167493 

.111531 


.1.259259 

.557823 

.289114 

.172653 

.113742 


.020 

.056 

.056 

.042 

.031 



16. 

24 

32. 

40 



5.744563 

7.280110 

9. 

10.816654 

4 


.506408 

.373206 

.263374 

.189641 



.358127 

.224279 

.147234 

.102126 



-.0084 

0049 

.014 

.016 




36. 

48. 

60 




8.544004 

10.049876 

11.704700 

0 



.346314 

.28373 

.224503 




.103258 

.118224 

.087272 




-.0054 

-.00083 

.0038 





64. 

80 


* 



11.357817 

12.845233 

8 




.262088 

.226472 





.092663 

.072507 





-.0034 

-.0015 






100. 






14.177447 

10 





.210551 






.059553 






-.0023 


* The first value m a cell is s, the second va, the third , the fourth £ 4 , the 
fifth 5. 
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5. Some extensions. Wc may generalize our results to any case where x 
and y are distributed approximately in a normal distribution such as the dis¬ 
tribution of the product of two means, when the sizes of the samples Ni and N« 
are large and consequently pi and p 2 will be large. Another example occurs if 
x and y each follows a Bernoulloi probability function with parameters pi and 
p 2 respectively where the number of trials m each case is large. We must warn 
the reader that the condition pi —> «, p 2 —> co alone does not mean that the dis¬ 
tribution of 2 approaches normality. Both x and y must be distributed normally. 

The actual problem wliich gave rise to this investigation was the question 
of determining the sum of a great many variates [ 8 ], Let T variates Uj, v 2) 
• ■ , v T be given whose sum A = y « is desired. Clearly 

A = TV, , V, = E vjT. 

i-1 

Now let us estimate A by X = where T a is an estimate of T and V a is an 
estimate of V v . If 0 - 5 ^ is very small, p t = T/ o-- will be large and p 2 = 

= -VnVp/itj, will be very large. Assuming T , is distributed normally and 
obviously Y t is distributed normally for N large, we see by the theorems of this 
paper that X will be distributed normally Confidence limits for A may be 
calculated in the usual fashion as X ± 70 ^, where 7 is determined by 



with a generally chosen as .025 or less and 


= Vf;4_ + V; 4' 


2 2 

a v, a T,‘ 


Stratification is also possible. It is interesting to note that many functions which 
occur m life insurance are products. Such applications will be treated fully 
elsewhere. Naturally the critical region whether both tails or one tail of the 
distribution should be used depends on the alternatives to the hypothesis being' 
tested. 

Generalizations of the mam theorem are possible for the probability function 
of 2 = n<~i x < where Xi, x 2 , ■ ■, x r follow a multivariate normal probability 

function. These will be investigated in a later paper. It may be noted that 
J. B. S. Haldane has investigated the distribution of a product along different 
lines [9], 
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NOTES 

This section is devoted to brief research and expository articles on methodology 
and other short items. 


A REMARK ON CHARACTERISTIC FUNCTIONS 

By A. Zygmunu 


University of Pennsylvania 

1. Let F(x) t —co <£<+ 00 , be a distribution function, and 

V (t)= f + ”e' l *dF(x) 

oo 

its characteristic function. It is well known that the existence of <p'(0) does 
not imply the existence of the absolute moment 

a) rvw*)- 


A simple example is provided by the function 


M) = C 


COS Tit 

n 2 log n 


J 


where C is a positive constant. Since the scries on the right differentiated term 
by term converges uniformly (see [1]), <p'{t) exists (and is continuous) for all 
values of t, and in particular at the point t — 0. Obviously <p(t) is the char¬ 
acteristic function of the masses C/2n 2 logn concentrated at the points ±n 
for n = 2, 3, • • . The constant C is such that the sum of all the masses is 1. 
The divergence of the series 21/nlogn implies that in this particular case the 
moment (1) is infinite. 

In a recent paper (see [2], esp. p. 120, footnote), Fortet raises the problem of 
whether the existence of implies the existence of the first algebraic moment 


( 2 ) 



= lim / xdF{x). 
2 -*+» J~x 


The mam purpose of this note is to show that this is so We shall even prove 
a slightly more general result. 

A function {/{() defined in the neighborhood of a point U is said to be smooth 
at this point if 

hm Mo + h) + \p(to — h) — tyjtq) _ q 
a-*+o h 

Clearly, if \p has a one-sided derivative at the point to , the derivative on the 
other side also exists and has the same value. Thus the graph of M) has no 
angular point for t = to , and this explains the terminology. If \p'(U) exists and 
is finite, M) 1S smooth for t = t 0 . The converse is obviously false, since any 
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function whose graph is symmetric with respect to f = to is smooth at that 
point. 

Theorem 1. If the characteristic function <p(t) is smooth at the point 0, then 
a necessary and sufficient condition for the existence of <p'( 0 ) is the existence of the 
moment (2). The value of (2) is —up' (0). 

In particular, the existence and finiteness of <p'(0) implies the existence of (2). 
That the converse is false, is obvious. For if a 0 , tti, a% , • • ■ are positive num¬ 
bers and a 0 + 2 ai + 2 a 2 + • • ■ = 1 , then ip(l) = a 0 + 22 ? o„ cos nt is the 
characteristic function of the distribution function F(x) corresponding to masses 
concentrated at the integer points ±n and having the values a n there. Owing 
to the symmetry of the masses, the number (2) exists, and is zero even if <p(t) 
is non-differentiable for L = 0 (we may e g. take for <p(t) the Weierstrass non- 
differentiable function C 2 ? a 71 cos b n t , where C is a suitable constant) 

Proof. We may write 

ip(t ) = / cos xl dG(x) +i sin xl dG(x) — ^i(f) + # 2 (i) 

Jo Jq 

where 

G(x) = F(x) - F(-x), H(x) = F(x) + F(-x). 


Thus 

(3) 0 < | MI | < A G. 

Since <p(t) is smooth at the point 0, and since is even, odd, 

0 = hm pW + y(-ft) ~ 2y(0) = 2 Hm Hh) - fr(0) 
h h 




so that, replacing h by 2h, 


i 


sm 2 hx 


h 


= -2 lim f 

h-*+0 Jo 


dG(x ) -4 0 


1 — cos hx 


dG{x) 


Since the integrand is positive we obtain successively 

rlth sm 2 hx 


i: 

i 


h 


dG(x) = o(l), 


(i te ) 


dG{x) = o(l), 


(4) 


J nim 

' x 2 dG(x) = oQi *), 

0 

fUh 

x 2 dG(x) = o(hr l ), 

Jl/2h 

rllh 

/ dG( x) = oQi). 

J l/2h 


as h —> 0. 


( 5 ) 
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Since \px{l) is even, the smoothness of <p{t), and so also of \Pi{t), at the point 
I = 0 implies that i/n( 0 ) exists and is zero If /i —* + 0 , 


fa (h) — fa{0) _ f°° sin xh 


h 


r™*h„ M ,r+r=A,+ 

t/o ll Jo Jlfli, 


Bk 


/»eo / n2 lh flilh pftlh 

B„ I < hr' I dll I < ir'i (IG+ (IG + dG + ■ 

Jlfh \ J l/h J2/;> M/h 

= h~'o(h + h/2 + h/i + ■•) = o(l), 


by (3) and (5). Also 

nl/fc »Ilk 

A h - x dli = / 

Jo Jo 


sm fan 
hx 


- - l\xdH = f 1 "0(x 2 h 2 )t 


x dG 


r llh 

= 0(x 2 h)dG = o(l), 
Jo 


by (3) and (4). Thus 

mo-mo) = 0(1) + r X(U[ _ 0(1) + r xdFi 

ll Jo J- 1 M 

and so 

^L^} = oil) + i [ l \ c ur 

n j-i/h 

It follows that the existence of (2) is equivalent to the existence of the right- 
hand side derivative of <p{t) at the point t = 0 , or, on account of smoothness, 
to the existence of <p'(0) Moreover, the value of (2) is — vp'{Q). This com¬ 
pletes the proof of Theorem 1 


2 . Suppose that a function i^(f) defined near the point to satisfies for /i—>0 
a relation 

\p(to + h ) = cto + ajl/l\ + • ■ + aic-Ji' 1 V(/c — 1)! + \ati + (r{l)]h k /k*, 

where a 0 , u\, ■ , a k are constants. Then o% is called the kth generalized de¬ 

rivative of f at the point to . It will be denoted by i p^) {to) • The existence 
and fmiteness of i {to) implies the existence of ^(io) and both numbers 
are equal. 

Another generalization of higher derivatives is based on the consideration of 
the symmetric differences 

h),i^(fo) = ${to + h) — i p{to — h), 

A hKto) = Kte + 2 h) - 2 i(t 0 ) + +(t 0 - 2 h), 

Afci/'fio) = fato T 3 h) — Sip{k + h ) + 3t(U — h) — ^{to — 3A). 
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If dff(tf)/(2h) 1 tends to a limit as h —> +0, this limit is called the fcth sym¬ 
metric derivative of \p at the point 4 We shall denote it by Dvl(tf). Clearly, 
Bk'P{U) exists and equals o), if the latter number exists. 

It is a simple matter to prove (see [3]) that if ft is a positive even integer, 
and if the characteristic function ip(t ) has at t = 0 a finite symmetric derivative 

x L dF(x) exists, and its value is (—l) W 2 D ; „<p(0). 

■«5 

Conversely, the existence of / x l dF(x) obviously implies (for ft even) the 

v—oo 

existence and continuity of <p w (t) for all t, and m particular at the point t = 0. 

In order to obtain an extension of Theorem 1 to the case of derivatives of 
odd order, we have to generalize the notion of smoothness. We shall say that 
a function ^(/t) satisfies for t = 4 conditioh S k , (ft = 1, 2 , • • ), if 

+1 '/'(£<>) = °(h k ) as h —>■ + 0 . 

For ft = 1, condition £; t is identical with smoothness at 4 Clearly, if i/u)(4) 
exists, satisfies condition Sk at 4 • 

Theorem 2 Suppose that ft is a •positive odd integer, and let <p(t) be the char¬ 
acteristic function of a distribution function F(x). If v satisfies condition Sk 
at the point 0, a necessary and sufficient condition for the existence of D k <p(Q) is 
the existence of the symmetric moment 

( 6 ) f x k dF(x) = lim f x k dF(x) 

J- oo X-H-co J-x 

whose value is then equal to i~ k Dk<p{f>)- In particular, the existence of v 3 ca> (0) 
implies that of ( 6 ). 

The proof of Theorem 2 is analogous to that of Theorem 1. Let 0(x) and 
H(x) have the same meaning as before. Since ft + 1 is even, condition Sk 
at the point i = 0 gives 

A£ + V(0) = /'” (e' xh - e~ wh ) k+1 dF(x) = 2* + 1 (-l ) (fc+1)/2 jf + (sin xh) k+1 dF(x) 

= 2 fc+ 1 (—1 ) ( * +1)/2 [ (sin xh) k+1 dG(x) = o(h k ), 
Jo 

so that 

,.1/ft 

/ (sin xh) K+1 dG(x) = o(h h ) 

Jo 

r iih 

( 7 ) x k+1 d0(x) = o(h~ l ) 

r Vh 

( 8 ) / dG{x) - o(h k ). 

J l/27i. 
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On the other hand, 

.-&A^(0) _ I - +M /sin xh' 

1 (2 h) k J-cc \ xh , 

f'l/h r 

= +/ = A/, + B h , 

Jo J 1 lh 

say. Here 

r p2th fHh 

dG(x ) = ir k + + • • 

1 111 L J Uh J 2//1 

= h~ k [o(h k ) + 0^0 + • J = 0 ( 1 ), 

by (8). Since 

( S -^y = {1 + 0(u 2 )} k = {1 + 0(u)} k = 1 + 0(u) 
for small u, we immediately obtain 

fl/A fl/A 

A h - x k dH(x) = 0{hx k+1 ) dG(x) = o(l), 

Jq Jo 

by (7). Collecting the results, we see that 



which completes the proof of Theorem 2. 

One more remark. By Theorem 2, the existence of the first moment is equiv¬ 
alent to the existence of the first symmetric derivative 

Dm<p{0) = linu_o [vW — <p(—h)]/2h. 

In Theorem 1 we have a corresponding result for ordinary first derivative 
?'(0) = lim/,-,0 l<p(h) — <p(0)]/h. 

There is no discrepancy here since at every point where <p is smooth the two no¬ 
tions of derivative are equivalent. 
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A LOWER BOUND FOR THE VARIANCE OF SOME UNBIASED 
SEQUENTIAL ESTIMATES 

By D. Blackwell and M. A. Girshick 
Howard University and Bureau of the Census 

Consider a sequence of independent chance variables Xi , x 2 , with identical 
distributions determined by an unknown parameter 8. We assume that Ex t = 8 
and that Wi = % + • • + x k is a sufficient statistic for estimating 9 from 

Xl , ■ , Xk. A sequential sampling procedure is defined by a sequence of 

mutually exclusive events Sk such that Sk depends only on x\, ■ ■ ■ , x k and 
S p(Sk) = 1 Define W = Wi and n = k when S k occurs. In a previous paper 
by one of the authors [1] it was shown that if S k = Wt C(Si + • ■ • + Sk- 0, 
(where CCA) denotes the event that A does not occur), the function V(W, n) = 
E(x 1 1 W, n) is an unbiased estimate of 8, and <r 2 (V) < ar 2 (x i). It is the purpose 
of this note to obtain a lower bound for tr 2 (V) . Our result is • 

Theorem i. cr 2 (7) > • 

We remark that the lower bound is actually attained in the classical case of 
samples of constant size N For in this case, (see [1]), V = E(xi | W N ) = Wn/N. 
In fact we shall show that in a sense this is the only case in which the lower bound 
is attained. 

The proof of Theorem I depends on certain properties of sums of independent 
chance variables. These, formulated more, generally than is required for the 
proof of Theorem I, are given in 

Theorem ii. Let Xi , x 2 , ■■ be independent chance variables vnth identical 

distributions , having mean 6 and variance <r 2 (xi). Let furthermoie (£>i) be any 
sequential test for which E(n) is finite Let W = x i + • ■ • + Xk when n — k. 
Then 

(a) a(W - Bn) < a (a*) E{n) 

(b) If a 2 {n ) is finite, the equality sign holds m (a) 

(c) E[x i(TF - Bn)] = a\xf). 

Proof of (a) Write y t = Xt — B, and define Y = yi + • ■ + Vk when 

n = k By definition, 

(1) r(W - Bn) = E [ (3/i + ■ + Vk? dP. 

k ~1 J Sh 

To prove (a), we must verify that the series on the right of expression (1) con¬ 
verges and has sum <o- 2 (.Xi)E(n) Now 

2 f (Vi + ■ ■ ■ + VkY dP 

(2) < £ [ (yx + • • • + VkY dP+ f ( 2/1 + • ■ • + y»y dp 

JSjc •* 

= S f yi dP + 2 S f Vk{y\ + • ■ ■ + Vk-i) dP . 

fc«=al J n )> k k =2 J n ;> fc 
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Since the event \n > k] is independent of y k , each term in the second sum 
vanishes and the first sum becomes 

X f yldP = cr 2 (xj X P{n>h\ 

1 J[nZ.k) U.1 

(3) = <r*(®i)[P{» = 1) + 2P{n = 2} + ■ ■ ■ NP[n = W) 

+ NP{n > JVj] < <r 2 ($i)E(n). 

Tins establishes Theorem 11(a) 

Proof of Theorem n(b). Write z t = \y,\ and let Z = Zi -J~ • ■ + z k when 
n = k. From (a) it follows that a[(Z — nE( 2 ,)] is finite. If in addition, 
a 2 (n) < * then E(Z 2 ) < °°. Thus the series 

(4) X [ (zi + ■ ■ • + 2r) a dP = X f 2.2, dP 

k—1 JSlc 1^4,J^fc<CO JSjfe 

converges, so that the series 

(5) X [ 2/>2b dP 

l£ t,j,SA:<oo JSk 

converges absolutely. The terms of the latter series may be arranged to yield 
(A): X ( (2/i + • • + Vkf dP - <r 2 (W - On) 

or to yield 

B: X [ yldP + 2Z ( Vk(yx + • ■ + y k -x) dP = a 2 (x 1 )E(n), 

*=1 1-2 J ( n £ fc | 

This proves Theorem 11(b). 

Proof of Theorem xi(c) It follows from Theorem 11(a) that Ex^W - On) 
is finite If we show that 

(6) E{W — On | Zi) = Xi — 6, i e. E{Y | i/i) = yi , rt will follow [1] that 

(7) E[ Xl (W - On)] = E[x ifcn - <?)] - «r*(a*). 

To verify (6), it is sufficient to show that if f(x 0 is the characteristic function 
of an event depending only on x t (i.e. f(x L ) = 1 when the event occurs, f(x i) = 0 
otherwise) 

(8) E(f V 0 = E(fY). 

Write cfo = 0, <jn - f-(yi + • ■ ■ + y,), i > 2. 

Then it easily verified that 

(9) JS(<f> 3 1 Xi , ■ • , x,) = cf>, for j > i 
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Hence it follows [2] that E4> = 0 where <j> = 4> l when n = i. In our case <i> = 
jy — fiji , and E<f> = 0 yields (6) This completes the proof of Theorem II 
Proof of Theorem i In [1] it is proved that E(xi(W — On)) — E{V{W — dn)] 

Hence employing Theorem II we get 

(12) /(n) = E[V(W - On)] = it(FMF - 6n) P 

where p, (0 < p < 1), is the coefficient of correlation between V and W — On. 
Substituting for <r(TF - On) we get 

v(xd < <r(7)ff(*i) VWnTp 

(13) < <r(V)a(xi) VW)- 


Solving for <r(P) we finally obtain 


(14) 


/(P) > 


S (%) 


E(n) 


which proves Theorem I. 1 

If a(n) is finite, the equality sign in (14) will hold if and only if p = 1. We 
shall now prove the following 

Theorem in. Let N be the minimum value of n for which P(n - N) ^ 0. 
Then, a necessary and sufficient condition that p = 1 is that P{n = N) = 1. 

Proof. The sufficiency of this condition follows from the fact that if 
p( n — jy) = i ( V = W/N To prove the necessity of this condition, we 
observe that if p = 1, V is a linear function of TP — nO. That is, 


(15) 


V = a (TP - nO) + 0. 


Now, since EV = 6 and E(W - nO) = 0, it follows that 0=0. Also, since 
by hypothesis <r*(V) = /(% )/E{n) and /(TP - nO) = <r\xi)E(n), it follows 
that a = 1/E(n). Hence the estimate V is given by 


(16) 


P = 


TP — 710 


+ 0 


1 Under certain regularity conditions Cramdr has obtained the inequality 

where f = f (a?, 0) is the density function of x ([3], p, 475) Thus with the same regularity 
conditions, our inequality yields 

tr J (F) > l/E(n)E 

which is a special case of the results presented by J Wolfowitz in this issue of the Annals, 
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Let IVbe defined as above. We note that N < <=° since by hypothesis E(n) < «, 
Let V N be the estimate of 0 when the sequential test terminates with n = N. 
Then V N = W/N Substituting this value in (16) we get 


(17) 


W _ = _N_ 

N E(n ) 



We exclude the trivial case where W = NO. Then (16) yields E(n) = If 
That is P(n — N) — 1. This proves the theorem. 

We remark that N may be a function of 8 but for a fixed 0, n = N is fixed 
when p = 1. 
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AN EXTENSION TO TWO POPULATIONS OF AN ANALOGUE OF 
STUDENT’S 2-TEST USING THE SAMPLE RANGE 

By John E. Walsh 
Princeton University 

1. Summary. The modified f-test considered by Daly 1 (see [1]) is used to 
develop one-sided significance tests to decide whether the mean of a new normal 
population exceeds the mean of an old normal population having the same 
variance. Significance tests are also developed to decide whether the mean of 
the new population is less than the mean of the old population. These tests 
require very little computation for their application and are approximately as 
powerful as the most powerful tests of these hypotheses. 

2. Introduction. Let n, • • , r„ , (n < 10), be independently distributed 
according to a normal distribution with zero mean and unit variance Let r(„> 
denote the wth largest of the r's. Then Daly has shown how to determine 
numbers g a such that 

^ Pr[f/(r w - r m ) > g a ) = a 

Pr[f/(r M - r m ) < -g a \ = a 

This note will use these relations to develop easily applied significance tests to 
decide whether the mean v of a new normal population exceeds the mean n of 


1 This problem ia also considered by Lord in [2] This note waB in proof when [2] appeared. 
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an old normal population with the same variance Significance tests are also 
developed to test v < g The simplest case considered is that of testing a new 
sample value x on the basis of n past sample values yi, ■ ■ , y n Then the 
significance test at significance level a to decide whether v exceeds p consists in 
accepting v > y if 

x>y-\-g a y/n-\- l[y M - y a) ], 

where y M is the wth largest of y x , ■ ■ ,y n . 

The significance test of v < y consists in accepting v < y if 

x < y - g a Vn -f 1 [y w - y m \. 

These tests are generalized to the case m which x is the mean of a sample of 
size r from the new population, each of y L , ■ , y n is the mean of a sample of 
size s from the old population, and z is the mean of a sample of size / from the 
old population Then the tests at significance level a take the form 

Accept v > y if x > (1 - C\)y + C x z + g a [y M - ym], 

( 2 ) 

Accept v < n if x < (1 - Ci)y + C x z - g a [yw - ym], 

where C\ is a given constant which is selected by the person applying the test. 
The introduction of the terms z and Ci allows less reliable past information to 
be utilized by lumping it together in the z term and using the constant C x to 
weight this information according to its relative importance with respect to 
the y‘ s. 

The power of test (2) is compared with that of the corresponding Student t-test 
for the case Ci = 0 and n < 10 In this comparison the quantities x,y x , ■ ■ ■ ,y n 
are considered to be the given sample values which are used for the test, that is, 
the quantities from which the means x, y\ , • • , y„ were formed are not given. 
It is found that the power of the Student /-test is only slightly greater than that 
of the corresponding test (2) For the cases considered, however, it is well 
known that the most powerful test of v > y. using the quantities x, y x , • , y n 
is the appropriate Student /-test Similarly for testing v < y Thus the tests 
(2) considered are appxoximately as powerful as the most powerful tests of 
v > y and v < y -which use x, yi, • ■ , y n . 

Examination of (2) shows that the amount of computation required for the' 
application of one of these tests is small Consequently the tests (2) have the 
desirable properties of being easily computed and nearly as powerful as any 
tests which could be used for the given hypotheses. This suggests their use in 
repetitive testing procedures which are concerned with the testing of the mean 
of a new sample on the basis of the means of previous samples. 

3. Statement of tests. In this section three significance tests of increasing 
generality are stated It is to be observed that each test is a particular example 
of the test following it so that tests (A) and (B) are special cases of test (C) 
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The reason for stating tests U) and ( B) is that these tests have a much simpler 
appearance and will cover most cases of practical application. 

{A) Let each of x, 1/1 , ■ , y n repiesent the mean of a sample of size r, let 

the values of the sample whose mean is x have the distribution N(v, cr 2 ) and the 
values of the samples whose means are yi , ■ , y n have distribution jVQi, a), 

where the notation JV(f, a) denotes the normal distribution with mean £ and 
variance <r 2 . Then the significance test of v > g at significance level a is 

Accept v > n if x > y + g a hiin) ~ 2 /(«]- 


The significance test to decide whether v < g is 


Accept v < g if x < y — Q{ a ) 


ftft 


[2/(n) — 2/(i)]- 


(B) . Let x equal the mean of r sample values from N(v, <r 2 ) and each of 
Vi, ■ • ■ 1 ?/n equal the mean of s sample values from N (g, a 2 ) . The significance 
test for r > g at significance level a is 

Accept v > y if x > y + (j a ft -f ~ [2/(n) — 2/d)]- 

The test of v < g is given by 

Accept v < y if a < $7 - g a + ~ k/w “ 2/<i)l- 

(C) Let x equal the mean of r sample values from N(v, cr 2 ), each of t/i, • • , y n 
equal the mean of a sample of size s from N(y, a 2 ), z equal the meati of a sample 
of size t from V(g, <r 2 ), and Cj be a given constant value. Then the significance 
test of v > g at significance level a is 

Accept v > g if 

x > (1 - Ci)y + Ciz + [y M - y^g* 



The significance test to decide whether v < g is 
Accept v < g if 


x < (1- Ci)y + Ciz - [y M - y m ]g a • 




Values of g a for a = .05 are given in Table I. These values were listed by 
Daly in [1] 2 


2 Values of g a for a = .05, .025, 01, 005, 001 and 0005 are listed in Table 9 of [2] for 
sample sizes from 2 to 20. 
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4. Derivation of tests. As tests (A) and (B) are particular cases of test. 
(C), it is sufficient to derive test (C) 

TABLE I 


Estimated Values of g 06 


n 

005 

3 

.882 

' 4 

.526 

s 

385 

6 

.309 

7 

.260 

8 

.227 

9 

.202 

10 

.183 


Let the quantities x', y[ , • , ?/„ , z' be defined by 

_j _ (x — v) \Jt j _ ( 1 /, - m)Vs 

£ ) 2/t > 

cr o’ 

3 

./_(« — *0 a/< 

2 — -- . 


(1 = l, • ■ , n ), 


Then a y[ , • ■ , y'n , 2 ' are independently distributed according to A(0, 1). 
Define 


r„ = (k\Vu - Z Vi + Ktx' + KiCz^j , 
It is easily verified that 


(«=!,•■•, n). 


E(r u ) = 0, E(rl) = [2Ci + (1 + <?)B\ - 2JC X + »] 
E(r u r v ) = r* [(1 4~ C 2 )Kl — 2Ki + n], 


(it ^ 11). 


Thus, if K\ and K t satisfy the equations 

(3) (l/; + c V;) s + K ‘-"- 0 

(1 + C*)Kl - 2Ki + n = 0, 

the r u will be independent of y when y = v. Also they will be independently 
distributed according to JV(0, 1). 
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Rewriting the r u in terms of x, yi, ■ ■ • , y n , s one. obtains 

(4) r “ = Kfc[ KlVv - £ vt + k *]/i x + K ' c ]/l 2 + - »)]. 

Using (3) the mean of the r u is found to be 

f = %£[* ~( l + c \/t)y + c \f\ z ^]- 

Let r(„) denote the wth largest of ri, • • • , r„ . Then from (1) 


a = Pr[f/(r tn) - r (1) ) > g„} = Pr j" ~~ ^ + C V 

+ 0 z — (v — y)j I (y<.n) — 2/(n) > 

It is easily proved from (3) that 


Kx ~ 

KtVr ± 



(VL±cViy\ 

s(l + C 2 ) ) 


Choosing the positive sign, putting C 



Ci , and letting g = v one obtains 


Pr 


x > 


(1 — Ci)ij + Ci? 


+ [voii — ym\9« 



verifying the first part of test (C) 

1 

choosing the negative sign for 
the second part of (1)) 


The second part of test (C) is verified by 
(or by repeating the above argument using 


5. Power comparison with t-test. Let x,yi, ■ • ,y„ satisfy the conditions 
of test ( B ) in section 3. Then Student’s t using x, y i , • ■ ■ , y„ is given by 

t = [a - y - (» - m)] _ / n - 1 

(y ' ~ ^ T s 0 + m) 

The Student i-test based on this value of L furnishes the most powerful test of 
v > m (and v < i±) using x, y t , • ■ ■ , y„ The purpose of this section is to show 
that test (B) has approximately the same power as this Student <-test for n < 10. 

E)aly has shown (see [1]) that if r x , ■ • , r n are independently distributed 
according to N(t;, a- 2 ), then the test based on 

(r ~ £)/(ri») - r<«) 
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has approximately the same power for testing £ > 0 (and £ < 0) as the corre¬ 
sponding Student 2-test based on 

( 5 ) t = (f ~ - $ 

yi(t- rf 


for n < 10. 

Using the notation of section 4 let 


V / s 

r “ = ttL 


— 2 2 A + -^2 




(m = 1, ••• , n), 


Ki „ 
where rr > 0 
IS-2 


Then from consideration of (4) with C' = 0 it is seen that the r u 


are independently distributed according to JV(£, <r 2 ), where £ equals a positive 
constant times (v — p). Following the derivations in section 4 with (7 — 0, 
it is seen that the test of £ > 0 with this particular choice of the r u is identical 
with the test of v > p given in ( B ) of sectfon 3 Similarly the test of £ < 0 is 
identical with the test (B) of v < p. Thus the test (£) has approximately the 
same power for testing v > /i (and v < n) as the Student 2-test based on the value 
of t given m (5) if n > 10 Replacing the r u in (5) by their values in terms of 
x > Vi > ‘ • • i V* > n , r > an d s, it is found that (5) becomes 


[x ~ - (v - m)i 

y 7 S (y. - yf 



This proves that test ( B ) is approximately as powerful for testing v > p. and 
v < p as the most powerful test based on the quantities x, yi, • • ■ , y n if n < 10. 
As test (A) is a particular case of test ( B), these results also apply to test (A). 
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ON THE NORM OF A MATRIX 

By Albert H. Bowker 
University of North Carolina 

In studying the convergence of iterative procedures in matrix computation 
and in setting limits of error after a finite number of steps, Hotelling [1] used 
the square root of the sum of squares of the elements of a matrix as its norm. A 
wide class of functions exists which may be employed as norms in matrix calcula¬ 
tion and substituted directly m the expressions derived by Hotelling. The 
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purpose of this note is to make a few general remarks about this class of functions 
and to propose a new norm which appears to have some value m computation 
A function 0(A) of the elements of a real matrix A may be termed a legitimate 
norm if it has the following four properties 1 

(X) 0(cA) = | c | 0(A), c a scalar; 

(2) 0(A + B) g 0(A) + 0(B), if A + B is defined, 

(3) t/>(AB) A 0(A)0(B), if AB is defined; 

(4) 4>(0 = 1, where e l} is a fundamental unit matrix 

whoso elements are all zero except the one m the ith row and jth column, whose 
value is unity These four conditions are identical with the first four axioms 
of Rella [2], who has shown them to be independent. Properties (1), (2), and 
(3) are used directly m investigations of convergence and error, but the im¬ 
portance of property (4) is indicated by some of its immediate consequences. 
Clearly e'.Aej = a ,,, where c, is a fundamental unit vector. From (3) and (4) 
it follows that | a „ | A 0(A) for all i and j and we have that 

(5) max(i 3) | a,,\ A 0(A). 

Thus 0(A) has the useful property that the norm of a matrix of errors exceeds 
or equals the maximum possible error. Since 0(A"*) ^ 0 m (A), it follows from 

(6) that the elements of A’" will tend to zero as m increases if 0(A) < 1, a result, 
winch is useful m establishing convergence Also 0(A) <£ 0. 

One further consequence of (1) to (4) is of interest. Suppose A is a square 
matrix and let X fie any of its roots. Then there exists a non-null vector x 
such that Ax = \x Now 0(Xr) - \0(x) g 0(A)0(rr) and wc have 


( 6 ) 


X < 0(A). 


Thus, every legitimate norm is an upper bound to the characteristic roots. 
Clearly many functions exist which satisfy (1) to (4) The norm used by 

Hotelling is N(A) = y Za!, • A new norm which may have some value is 

obtained as follows. 

(7) R(A) = max (l )R,(A) 


where 

BAA) = D | o VJ | 

J 

Clearly R(cA) = | c | Zi!(A) To show that R satisfies (2), consider 

RU + B) = £ K + b J ^ | a„ | + Z | &« | ^ R(A) + R(B). 

J J J 

Since the above inequality holds for all i, 

R{A + B) A R(A ) + R(B). 
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Now AB = || X) dicJ^aj | 


and 

R X (AB) = | Mia bet] I ^ | &»*a I * I b aj \ 

j a 1 ct 

£ £ | a* | R„(B) g R(B)R(A). 

a 

Hence R(AB) g R(A)R(B). Clearly R(e„) = 1. Similarly it may be shown 
that C(A) = max (j ) XX a * 5 1 a ^ s0 satisfies the conditions of a norm 

t 

Since the convergence of an iterative procedure is often proved by the norm 
being less than one, since the norm appears in the upper bound for the error 
after a fini te number of iterations, and since the norm of a matrix of errors is 
taken to indicate the magnitude of the errors, a reasonable method of choosing 
among several available legitimate norms is to select the smallest It is natural 
to inquire whether an optimum norm in this sense exists, that is, is there a 
function <f>*(A ) such that 4>*(A) possesses properties (1) through (4) and such 
tl^at 4>*(A) g 4>(A) for all other <fr(A) satisfying these conditions. Assume such 
a <f>*{A) does exist. Clearly <t>*(A) = <f>*(A'), as, if either exceeded the other, 
the smaller could be taken as 4>*(A). Let A 2 be the largest root of AA'. Then 
by (6) 

A 2 g <t>*(AA') g **\A) and A g 
But Rclla [2] has shown that A possesses (1) to (4). Thus 

<*>*(A) = A 

But, for a row vector, C(A) g A Consequently, no minimal norm exists. It is 
interesting to note that a worst norm does exist, namely P{A) = X I °»X 

V 

Since A = £ e l3 a XJ , </>(A) S P(A) Clearly P(A) satisfies (1) to (4) and hence 
v 

is the worst possible legitimate norm. 

In practical computation, the choice so far is between N(A) and R{A) (or 
0(A)) No general inequalities exist and it would probably be advisable to 
compute both. R(A) may be less than N(A) and indicate convergence when 
N(A) fails to do so Often R(A) may be computed visually and convergence 
proved without computing the sum of squares of the elements. 

The functions N(A) and R{A) may also be useful in finding a simple first 
approximation to A~ l A sufficient condition that Hotelling’s iterative method 
for finding the inverse of a matrix A will converge is that the roots of 
D = 1 — A Co be less than one in absolute value where Co is a first approximation 
to A' 1 If the iterative procedure is to be carried out by a fully automatic 
computing machine such as the one described by Alt [3] it may be advisable to 
start with a rather poor first approximation which is easy to construct If A 
has positive roots and if M is any upper bound to these roots and if C o is a matrix 
with diagonal elements equal to 1/M and zeros elsewhere, the iterative procedure 
will converge but the norm of D will not necessarily be less than one. From 
(6), any legitimate norm may be taken as M 
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Finally, it is interesting to point out the relation of this note to some work on 
the problem of finding upper bounds to the roots. In fact, the inequalities 
X < N(A) and X ^ R(A), which are consequences of (6), are Theorem 2 of 
Farnell [4] and Theorem 3 of Baranlun [5] respectively. 
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DEFINITION OF THE PROBABLE DEVIATION 

By M. FuDchet 

Faculty of Science, University of Pans 

The probable deviation has recently been defined by E. J. Gumbel [1], [2] 
as the smallest of the intervals corresponding to the probability j. It so hap¬ 
pened that the author was led to an equivalent definition starting from a general 
idea which may be applied to absolutely general cases and which, for this reason, 
might be of interest. * ■< 

In recent years, the author has been occupied with a study of random ele¬ 
ments of any nature (curves, surfaces, functions, qualitative elements), a study 
whose future seems promising, [3]. I gave a definition of the mean of such an 
element expressed by an abstract integral which, however, is only defined if the 
random element is situated in a metric vectorial (Wiener-Banach) space. 1 But 2 
a still more gcnoral definition is valid if the random element is placed in any 
metric space. It consists of taking, as mean position of the random element X, 
a fixed (non-statistical) element b = X such that the function of a which rep¬ 
resents the mean M{X, o) 2 of the squared distance of X to the fixed element a, 
is minimum for a = b. (In the case where X and a are numbers, and where 
M(X) 2 is finite, we know that this minimum is reached and that there is one, 
and only one, determination b of a). This definition has the advantage_of also 
defining the equiprobable position of X. This is a fixed element c — X such 
that M (X, a) is minimum for c = a. (If X and a arc numbers, we know that 
this minimum is still reached, but may be so reached by several values of A). 

Since reading Gumbel’s paper, a still more general definition suggested itself. 


1 For tlie definition of metric vectorial spaces see [4], 
s See Note 2, p 503 of [4] 
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The expressions M(X, a) and V M(X, a) 2 themselves may be considered as 
distances, but as distances of two random elements taken together. To each 
of these distances corresponds as minimum, when a varies, a different “typical” 
function X or 1 • • ■ - Thus, without supposing anything about the space 
into which the different trials place X, wc assume that we have defined a “de¬ 
viation” of two random elements X, 7 taken together We represent this 
function of two random variables by (([X], [7])), a notation which differs from 
the representation of the distance ( X , Y) of the two positions X and 7 with 
respect to a single trial The lower boundary of the deviation (([X], [o])), a 
function of a, which is reached for a = X defines a “typical” position X. More¬ 
over, the value of this (([X], [X])) may be considered as a measure or, at least, 
as a numerical ranging point of the dispersion of X. 

Let us abandon these generalities. They hold especially if the element X 
is a real valued random variable. Among the possible and reasonable expres¬ 
sions for the deviation (([X], [a])) of the numerical variate X from a fixed number 
a, we may use the equiprobable value of | X - a | which may be called the equi¬ 
probable deviation of X from a. Thus we have, on one side, a new “typical 
value” of X which will be a value of a such that the equiprobable deviation of X 
from a is minimum, and a new measure of dispersion which is the value of this 
minimum and which might be called simply the equiprobable deviation of X. 

In the case where X has everywhere a continuous and finite density ofprdb- 
ability w(X) we find, as typical value, what Gumbcl calls the “midvalue” 
and represents by i, and, as equiprobable deviation, what Gumbel calls the 
“probable deviation” and represents by f. 

We may also consider the discontinuous case, which was given as a problem 
to candidates of the “Certificat d’Etudes Supdneu'res de Calcul des Probability, 
Option Statistique Math6matique, Session May-June, 1944.” They had to 
solve various questions of which I cite the beginning below: 

“Consider n real numbers X\ A X 2 A • ■ • A x n and represent, by E a , a median 
value of the deviations | x k - a \ of the numbers x k and a. If a varies, E a has 
a minimum E which is reached by one or several values A of a. 

11 Explain, in a few words, the meaning of the values E and A 

2) For simplicity’s sake, suppose that n is odd (n = 2r + 1) How should 
E and A be calculated practically? (To find the answer, investigate first how 
E a varies if a varies only slightly). 

3 ) In the case where n = 4s + 3 (s is an integer equal to, or larger than, zero) 
show that E A — — — 

where 5fi = x «+i > Q* = *»-» •” 

The study of this new typical value and of this new equiprobable deviation 
has the advantage that their determination is very rapid and requires hardly 


3 Sec the Remark at end of note 
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any calculations. However, we have to note an important inferiority of the 
equiprobable deviation of X compared to the mean and the standard deviations 
of X. If one or the other of the last two deviations is zero, X is a fixed number 
(except for. the case of the probability zero) This property seems requested 
by the intuitive meaning which wc attribute to the dispersion, and to every 
measure or any mark of it. Now, the equiprobable deviation lacks this property. 
If, for instance, X has only three values: 0, 2,1, the first two with the probability 
0.249, and the last with the probability 0.502, the equiprobable deviation of X 
will be zero, whereas X will be equal to its typical value 1 only with a prob¬ 
ability of 0.502, and not with a probability equal to unity The same holds 
for any distribution for which there is a point with. probability exceeding | 
Remark. The definitions of the mean and of the equiprobable position become 
meaningless in the case that M(X, a), or M(X, a) 2 , is infinite However, we 
succeeded in surmounting the difficulty, and to roach definitions which are valid 
even m this case. If X is a number, the new definitions become equivalent to 
the classical definitions of the mean and equiprobable value. The proofs are 
given in two recent articles [5], [6], 
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THE GENERAL RELATION BETWEEN THE MEAN AND THE MODE 
FOR A DISCONTINUOUS VARIATE 

By M Frechet 

Faculty of Science, University of Paris 

Dr Gumbel has pointed out that one of the author’s arguments employed in 
several particular cases (see [1]) can be employed in a general case which includes 
them and leads to the following result: If a statistical variate R has only positive 
entire values differing from zeio, and if its mean value R is smaller than, or 
equal to, unity, the same holds for its equiprobable value R and its mode R 
There are two generalizations of this result which might be of interest: 

1) On the one hand, the author has shown [2] that, if a variate R can only 
have values (entire or not) equal to, or larger than, zero, its equiprobable value 
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2 1Sj at moat, equal to twice its mean value R, and the inequality R/R g 2 
cannot be improved which means that the upper boundary of the first member 
is exactly equal to (and not less than) two The equality is reached when R 
lias only two values of equal probability, one of them being zero. 

2) On the other hand, if R is an integer positive variate equal to, or larger 
than zero, it can be proven that, if R ^ a, we have 

,D 


Here, R and R stand for the mean and for the mode of R respectively, and a is 
a positive integer differing from zero. For example if R is the number of rep¬ 
etitions of an event with probability p, we have, for n trials, R = np, whence, 
if a is the first integer number equal to, or larger than, R we have the inequality 
(1) for the most probable number of repetitions Naturally, tins inequality 
only has an interest if the second member of (1) is smaller than n which means 
that 

a(a + 3) < 2n. 


This presupposes 


2 n > np(np + 3) 


or 


n < 


2 - dp 


and, since n must be positive, 

V < § 

To prove the inequality (1), let us write u. for the probability that R 
We have 


= v. 


whence 

( 2 ) 

Let the mode bo 

then 


h ' OJy - 1 J ^ ) VCOy R ~ OL 

0 o 

a—1 w 

yi (a — v)txiy Si 23 (? a)uy. 
o 


o+l 


R = 0 


^ J ^ Oj " r 

and the first member in (2) is bounded by 

a(a + 1) - XT' { \ 


( 3 ) 


2 
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Now, either a < j3 or /? g a. In the first case the second member in (2) leads to 

co 

(4) X) (v ~ “)»>• ~ C/3 — «)“/3 

a+l 

since the second member in (4) is one of the terms occurring in the sum. The 
same inequality holds in the second case, j3 ^ a, hence it holds generally. It 
follows from (2), (3), and (4) that 


00 

The probability up is certainly different from zero, since X = 1- Conse- 

0 

quently 

«-«s 


or 

g < g(a + 3) 

P ~ 2 

as stated in (1). 

The equality in (I) is possible only if, from (3), 

a(o>ji — Wo) + (« — l)(«(j — «i) + • • • + (‘-V* “ w«-0 — 0 

and from (4) 

CUa+l + 2(Ja+! + • • • + (/3 — a)up + • ■ • = (J3 — a)up 

whence 

(5) Wo = Wl = • • • = Up = • • = Ci) a _i 

and 

(50 <c«+i = w a ^-j — • • • = 0. 


The existence of the exceptional case proves that the inequality (1) cannoi 
be improved by replacing the second member by a smaller function of a. In 
the exceptional case, the only possible values of R are 

® = 0, 1, 2, I, a, 13, 

and all values, except perhaps a, are equiprobable. The probability «„ may 
be, but need not be, .equal to wg. 

Moreover 
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and /? = a is passible only if a = /3 = 0 whence, from (5), w„ = 0 except for 
v = 0 which means that R only has one value equal to zero. Except for this 
trivial case, we have in the exceptional case fi > a, and there are a + 2 possible 
values for R. Then we must have 

at 

up w a ; 2 "h «(i = 1 
o 

whence 

(a + l)up +■ w a = I 

and, from (5), 

> p V i „ _i_ /“(<* — 1) , + 3)\ , 

« s fi = (OJ ^ V + Pup + ctUot = up ( --- + - L —2-- 1 + aWo 

= c*((a + l)w<) + w a ) 


whence 

(7) 

From 


follows 


It = a. 


1 = (a + l)wp + u„ ^ (a + 2)w„ 


( 8 ) 


W a 


1 

a + 2 ’ 


Up 


1 — Wa 
« + 1 ' 


These conditions (5), (5'), and (7) are necessary and sufficient for the existence 
of the exceptional case. 

If the equality in (1) is excluded, the mode /3 and the smallest integer number 
a which is equal to, or larger than, the mean, are related by 

(9) p <> -fe- ± 3) - 1 = ? 2+ -fe- + 2 . 

2 2 


As shown before, this general inequality, valid for any discontinuous variate, 
which can assume only non-negative integer values, cannot be improved without 
assuming specific properties of the distribution. 
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NOTE ON DIFFERENTIATION UNDER THE EXPECTATION SIGN 
IN THE FUNDAMENTAL IDENTITY OF SEQUENTIAL ANALYSIS 

Bv T E Harris 
Princeton Umvei sily 

Let z be any chance variable and z x , z-,. , 2.1 , • a sequence of independent 

chance variables, each with the same distribution as z. Let r / JN = Z\ + g 2 4 . 

+ zn ■ Let 4>(J) — Ee 2t for all complex t for which the latter exists Let Si , 
S 2 , be a sequence of mutually exclusive events such that S } depends only 

co 

011 zi , « 2 , • , z ,, and P(S,) = 1 Let the chance variable n be defined 

j=i 

as n — j when S, occuis. Blackwell and Girshiclc [ 1 ], generalizing a result 
of Wald [2], showed that if there is a positive constant M such that 

(1) | Z N | < M whenn > N 
then the identity 

( 2 ) E{e z " l mr n ] = 1 

holds for all complex t for which exists and | 4>(t) | > 1 , Wald [3] estab¬ 
lished conditions, including the existence of qb(f) for all real t, under which 

( 2 ) may be differentiated under the expectation sign an unlimited numbei 
of times 

Without assuming the existence of <f>({) for a real {-interval the following result 
holds: If ( 1) is true and if E{z k ) and E(n k ) are both finite, k a -positive integer, 
then 

(3) E - 0 

where 1 = s/— 1 and sis real. Certain identities, obtained by differentiating 
(2) and putting t = 0, can also be obtained from (3) For example, if En = 0, 
and if En 2 and Ez 2 both exist then EZ\ = Ez 2 En. 

Let Pn = P(n < N), p K = F{n = N) Lot H(j, Z fi and F{N, Z N ) be the 
conditional cumulatives of Z } and Z N for n = j and n > N respectively Now 
(2) was derived by Wald [2], p. 285, from a relation, valid whenever 4>(t) exists, 
which in the present notation becomes 

(4) g vt (*(*))"’ «*'* dB(j, Z,) + jT e ZNl dF(N, Z N ) = 1. 

Examination of Wald’s derivation of (4) shows it to be valid under the present 
hyjkitheses. Now the finiteness of E(s k ) clearly implies that of E(Z k \ n ~ j ) 
Also, since F{N, Z N ) is constant outside the interval [ — M, M] , the integral 

f Z% dF(N, Z N ) is finite. Hence we may set t = is in (4) and differentiate 
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k times, obtaining for all real s 

Jt, r" r 71 

E P, J_ m —mis))- 1 e Z),a J dH(j, Z,) 

( 5 ) ’~ 

+ (1 ~ Pn) E Q ^ [(*(«))'*!■ f_ (zZ N ) k ~ T e ZNt ‘ dF(N, Z N ) = 0. 

The derivatives of. (</>(is)) _Ar are sums of terms of the form Q(N) (4>{i s))- N ~ r 
times terms independent of N, where Q(N) is a polynomial in N of degree < fc. 
For any r < fc, 


hm | (1 - P n )N t I 

= hm 

tr ± P , 

< lim 

E fvi 


JV-»cO 

)=.V+1 




since En h is finite. Hence lim (1 — P n )Q(N) = 0 Because of ( 1 ) the inte¬ 
grals in the second term of (5) are bounded as N —> «>. Now set s = 0 in (5) 
and then let N — > “ Since <£(0) = 1, the second term of (5) approaches 0 
and the limit of the first term is just the left side of (3) 

For the case of a Wald sequential process, Stein [4] has shown that all moments 
of n are finite In this case (3) holds whenever Ez l is finite. 
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A UNIQUENESS THEOREM FOR UNBIASED SEQUENTIAL 
BINOMIAL ESTIMATION 

By L J. Savage 1 
University of Chicago 

In a recent note [1], J Wolfowitz extended some of the results of a paper by 
Girshick, Mostellcr and Savage [2] on sequential binomial estimation. The 
present note carries one of Wolfowitz’s ideas somewhat further The nomen¬ 
clature of [1] and [2] will be used freely The concept of “doubly simple region” 
introduced m [1] and assumed there only m the hypothesis of Theorem 3, will 
here be shown to be unnecessarily restrictive. In so doing, we find that sim- 


l Tlie autkoi is a Rockefeller fellow at the Institute of Radiobiology and Biophysics, 
University of Chicago 
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plicity is not only a necessary (cf Theorem 4 of [2]) but also a sufficient condi¬ 
tion that p be the unique unbiased estimate of p for a closed region 

Lemma. If R is simple there is at most one hounded unbiased estimate of any 
given function of p. 

Proof. If the lemma were false, there would be a non-trivial bounded un¬ 
biased estimate of zero, i.e., m(a) such that | m(oi) | is bounded by a constant 
m*, m(a) not identically zero and E(m(a) | p) = 0 . 

( 1 ) E(m(oc ) | p) = E m(oi)k(a)p v cf = 0 

and m(a ) not identically zero, Since R is simple we may assume (much as in 
the proof of Theorem 6 of [2]) that we have a boundary point such that 
m(a 0 ) ^ 0 , ao is below all accessible points of its own index and also below 
every other a for which m(a) ^ 0. Therefoic 

( 2 ) I m(a 0 ) | k(a<f)p m cf* = | E m(a)k(a)p v q 1 1 < m" E k{a)p“(f. 

V>1Jo U>V(i 

Let M denote the set of all accessible points and boundary points at which 
x < .to and y = y a + 1. There are at most x 0 points m M, say , • • , . 

Considering the way m which a 0 has been chosen, every path from (0, 0) to an a 
for which y > y<> passes through or to at least one point, of M, Therefore when 
V > 2/o 

P(«) = k(a)p*q x = P(a\M)P(M) 

(3) <P(‘*\M)t l k(l3 i )p'“ +1 q x ’ 

< p ’ /0+1 E k(P,)P(<x | M). 

1 


Prom inequalities (2) and (3). 


(4) 


»(oo) | /c(«o)p v T < mV 0+I fE *09,)}- E P(» I M) 


v>vo 


< m*p v ° +1 E 


But it is impossible that (4) should be satisfied for small p. 

Combining the Lemma with Theorem 4 of [2] we have the 
Theorem. A necessary and sufficient condition that p(a) be the unique proper 
( bounded ) and unbiased estimate of p for a closed region R is that R be simple 
The sufficiency part of this Theorem extends Theorem 3 of [1] from doubly 
simple regions to simple regions. 

The author is indebted to J. Wolfowitz for his valuable suggestions m connec¬ 
tion with the present note. 
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ACKNOWLEDGEMENT OF PRIORITY 

By II. E. Robbins 
University of North Carolina 

At the time of publication of my papers on the measure of a random set 
(.Annals of Math. Stat., Vol. 15 (1944), pp 70-74, Vol. 16 (1945), pp. 342- 
347), I was unaware that the theorem on page 72 of the first paper, which 
affords a means of computing the expected value of the measure, had already 
been found by A. Kolmogoroff ( Grundbegriffe der Wahrschemlichkeitsrech- 
nung, Ergebnisse der Mathematik, Berlin, 1933, p. 41). I wish to take this 
opportunity of acknowledging KolmogoroiTs priority, which was pointed out 
by Prof Henry Scheffd. 



ABSTRACTS OF PAPERS 

Presented on January 25, 1947, at the Atlantic City meeting of the Institute 

1, A Test of Significance of the Coefficient of Rank Correlation for more than 
Thirty Ranked Items. Nxlan Norris, Hunter College. 

Hotelling and Pabst ( Annals of Math. Stal., Yol 7 (1036), p. 37) have suggested the use 
of the Tohebycheff inequality as an approximation for testing the significance of the co¬ 
efficient of rank correlation m eases where the number of ranked items is too large to enable 
exact probabilities to be computed directly A table prepared in accordance with this 
suggestion indicates that for values of the coefficient of rank correlation larger than 50 
there is a wide range of corresponding numbers of ranked items greater than thirty for 
which at least the five per cent level of significance is satisfied. 

For certain types of applications the conservativeness of the Tchebychelf test may be 
a virtue rather than a limitation. 

2, A Generalized T Measure of Multivariate Dispersion. Harold Hotelling, 

University of North Carolina 

The problem of combining errors m two or more dimensions to measure the accuracy of 
firing and bombing is similar to problems occurring in industrial quality control whero 
different measures of quality are applied to the same aiticlo, and to problems in mental 
testing and other fields If the covariances were known a priori, the solution optimum 
in certain senses, for a multivariate normal distribution, would be the use of x 2 = 22\i{X ,%,, 
where is the covariance matrix and t, is the deviation in the rth dimension. Since 
the covariances must in all known practical cases be estimated from a preliminary sample 
with (sayj n degrees of freedom, x a may be replaced by T 2 = x,xj , where [l,/] -1 is 
the estimated covariance matrix This is the same T introduced by the autlioi in 1931 
as a generalization of the Student ratio l, and has the same distribution. Upon adding 
together the values of T 1 for different cases (e g for different bombs dropped w'lth the same 
bombsight), a combined measure Tjj of over-all excellence (e g of the bombsight), is ob¬ 
tained T 2 like x 2 , can be bioken down into components meaningful with respect to the 
causal system, specifically in relation to possible sources of excessive discrepancy. Thus, 
if r, is the ih coordinate of the centroid, or mean point of impact, of m bombs, we may 
write T% = 22^, Tj) — Tq Tm Then Ttj is a function only of deviations from 
the mean point of impact Asymptotically (for large n), To , T M and Td have the % dis¬ 
tribution with to, 2 and to — 2 degrees of freedom respectively But the untiustworthiness 
of the x distribution as an approximation is evident even with n as large as 256, for which 
case calculations have been made The exact distributions of T 0 and Td aic ascertained 
when the number oi variates p is 2, and the probability integrals aio expressed as linear 
functions of two incomplete bota functions In fact, Tl/M equals the sum of the roots 
of a determinantal equation of the form | A — A/31 = 0, where A and B are sample covariance 
matrices with n and m degrees of freedom respectively, and a similai relation holds for T l D 
with m replaced by m — 2 To and T M have the distribution published in 1931, with prob¬ 
ability integral expiessible in terms of a single incomplete beta function or the variance 
ratio distribution. It is shown that such parameters as the circular mean deviation are 
best estimated with the help of the T measures, not directly by averaging individual cir¬ 
cular deviations. 

3 Asymptotic Properties of Maximum and Quasi-Maximum Likelihood Esti¬ 
mates. Herman Rubin, Cowles Commission for Research in Economics. 
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The results of J L. Doob (Turns Am Math Soc , Vol. 3G (1934), pp 759-775) on con¬ 
sistency of maximum likelihood estimates, are generalized and extended to arbitiary mea¬ 
sure spaces In some special cases, results on asymptotic normality of maximum likeli¬ 
hood estimates call be gcnci alized to quasi-maximum likelihood estimates (estimates based 
on the assumption of a likelihood function which need not be the true function) 

4 The Asymptotic Distribution of the Range. E. J. Gumbel, Newark Col lego 
of Engineering. 

The asymptotic distiibution of the range w for initial unlimited distributions of the 
exponential typo is ohtamed by convolution of the asymptotic distributions of the two 
extremes Let a and u be the pnrameteis of the distributions of the extremes for a sym¬ 
metrical variate, and let R = a(w — 2u) be the reduced range Then the probability 
i'(R) of the reduced lango is subject to the differential equation'I'" + T' — exp (—It) = 0 
which may be transformed into Bessel’s equation of the first order by the substitutions 
R = 2(log2 — log z), and Hr = zTJ. The solution is 'S'(R) = zK,(z) foi the asymptotic prob¬ 
ability, and ip(R) = (z 2 /2)K a (z) fol- the asymptotic distribution, IC 0 (z) and ifi(z) being the 
modified Bessel function of the second land of orders zero and umty Thus tables of 'I'(R) 
and <P(R) may be calculated for any symmetrical distribuion of the exponential type 
The distribution of the lange w foi normal samples of size 10 is nlieady very close to the 
asymptotic distribution provided that the parameters a and u are determined from the 
mean and the standard deviation of the range This method permits the calculation of 
the distribution of tlie range foi normal samples of any size larger than 10 

5 The Comer Test for Association. John W. Tukey, Princeton University, 
and Paul S. Olmstead, Bell Telephone Laboratories 

Construction In a scatter diagram, draw the two medians, that is, the median of the 
x values without regard to the values of y, and the median of the y values without regard 
to the values of x Think of the four quadrants thus foimed as bemg labelled 
in older, so that the two positive quadrants lie along one diagonal and the two negative 
along the othei. Beginmng at the right-hand side of the diagiam, count in along the ob¬ 
servations until forced to cross the horizontal median Write down the number of ob¬ 
servations met before this crossing, attaching the sign, +, if they lay in the + quadrant, 
and the sign, —, if they lay in the — quadrant Repeat this process, moving up from 
below, moving to the right from the left, and moving down from above The quantity to 
be used in the test is the algebraic sum of the four numbeis thus wntten down 

Distribution The exact distribution of this quantity when no association is present 
and no two x’s and no two y’s are alike is almost independent of sample size over the range 
of values where it is apt to be used For example, a sum of 9 oi moie is expected less than 
one time in ten foi all samples of size 0 or more, a sum of 16 or more, less than one time in 
100 foi all samples of size 10 or more, and a sum of 21 or more, less than one time in 1000 
for all samples of size 14 or more Even for infinite sample size, the sums foi those fiactions 
become only 9, 14, and 19, respectively. 

Extensions, The same ideas that underlie the outside corner test for two variables 
may be extended in several ways to give tests for various types of association among three 
or moie variables. 

6. Consistent Estimates Based on Partially Consistent Observations, with 
Particular Reference to Structural Relations. J Neyman and Elizabeth 
L Scott, University of California 
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Let (Xft) be a sequence of independent random variables and let F, denote the distribu¬ 
tion of X ;. Eaeh distribution F, is assumed to depend on unknown parameters. If a 
parameter 6 appears in an infinity of distributions F ,, it is called structural. Otherwise 
it is incidental The sequence |Z„) is called consistent if (F n ) has no incidental parameters. 
(Z n ) is Called partially consistent if (F„) has both structural and incidental parameters,— 
Problem of fitting a straight line when both variables are subject to errors is that of a 
partially consistent series of observations. Let £ and n = a + 0£ be two linearly connected 
quantities, perhaps related to particular stars, where a and 0 are unknown The values 
£, and i), corresponding to the ithstar, (* — 1, 2, , s), are unknown. The observations 

provide measurements x t , of £,, (j - 1, 2, - , m), and measurements y,*, (k = 

1,2, • • , n,), of i. Both mi and n, are bounded and small. On the other hand, s may be 
considered as increasing without limit —Assume that the Xi, and the jq* are normally 
distributed with variances o\ and <r a and means £,■ and *i, respectively Then the totality 
of observations will form a partially consistent system with the structural parameters a, /}, 
<r i and cri and with £, as incidental parametersIf the observable random variables are only 
partially consistent, then the maximum likelihood estimates of the structural parameters 
(a) need not be consistent, (b) oven if they are consistent and asymptotically normal, 
alternative estimates may exist which have the same properties but smaller asymptotic 
variances.—Consistent estimates of structural parameters may bo obtained from “modi¬ 
fied” equations of maximum likelihood. The lower bound of the variance of estimates of 
structural paiametera, providod by the Cramdr-Rao inequality, is attained only on certain 
conditions which are both necessary and sufficient. 



NEWS AND NOTICES 

Readers are invited to submit to the Secretary of the Institute news items of interest 

Personal Items 

Dr Paul H. Anderson has been appointed Economic Analyst with the Market¬ 
ing Division, Office of Domestic Commerce, Department of Commerce, Wash¬ 
ington. 

Dr. Gilbert W Beebe is now with the Division of Medical Sciences, National 
Research Council, Washington. 

Professor Harald Cramer, Director of the Institute of Mathematical Statistics 
of the University of Stockholm, was awarded the degree of Doctor of Science, 
honoris causa, by Princeton University on February 22,1947. Professor Cr am er 
has acted as Visiting Professor of Mathematics at Princeton University and 
Yale University during the academic year 1946—’47 He will be at the Univer¬ 
sity of California at Berkeley during the 1947 Summer Session. 

Dr. Paul M. Densen has accepted a position with the Division of Medical 
Research Statistics, Bureau of Medicine and Surgeiy, Veterans Administra¬ 
tion, Washington 

Mr. M. V. Divatia is now in charge of the office of the Statistician and Eco¬ 
nomic Adviser and Under-Secretary to the Government of Sind, Karachi, 
India. 

Mr. Clarence B. Fine, formerly with the Office of Price Administration, has 
transferred to the Bureau of Old-Age and Survivors Insurance, Social Security 
Administration, where he is employed as a Sampling Expert. 

Prof Charles C Grove was appointed Visiting Lecturer in Mathematics at 
the University of Pennsylvania for the sprmg semester. 

Assoc Prof. E. E. Haskins of Northeastern University has been appointed to 
an assistant professorship at the Army Air Forces Institute of Technology, 
Wright Field, Dayton, Ohio. 

Prof Roger Lessard of the Hull Technical School has accepted a position at 
the Ecole Polytechnique, Montreal 

Mr. Edward D. Lowery is now a member of the Research Department, Win¬ 
chester Arms Company, New Haven, Connecticut. 

Professor II. B. Mann of Ohio State University has been awarded the Frank 
Nelson Cole prize in the Theory of Numbers for 1946. 

Dr. Margaret P. Martin has been appointed to an assistant professorship in 
the Department of Preventive Medicine and Public Health, Vanderbilt Uni¬ 
versity Medical School, Nashville, Tennessee. 

Dr. A. L. O’Toole is at present employed by the. Veterans Administration m 
the Washington headquarters, as Acting Chief of the Administrative Analysis 
Division in the Research Service. Dr. O’Toole was released from the Navy on 
September 23, 1946, to inactive duty in the U S. Naval Reserve, with the rank 
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of Commander Di O’Toole served for nearly four years in the Navy, j n 
important administrative and statistical work for the Commander South Pacific 
Area and South Pacific Force Pie will be remembered as having been with 
Admiral Halsey’s Pacific Fleet, and was awaided the Bronze Stai Medal At 
the time of his release, he was Chief Staff Officer lor Commander South Pacific 
Area and South Pacific Force. 

Mr I B Perrott, since his demobilization from the British Army, has been 
Lecturer in Mathematics at the College of Technology and Commerce, Leicester,' 
England. 

Mr J. S Ripandelli is now with the Actuarial Department of the Jefferson 
Standaid Life Insurance Company of Greensboro, Noitb Carolina 

Dr Ronald W Shephard of the University of California has been appointed 
to the staff of the Department of Mathematics, New York University 

Mi, John R. Stehn is now a member of the Research Laboratory of the Gen¬ 
eral Electric Company, Schenectady, New Yoik. 

Dr Charles W Vickery, formerly of Ohio State Univei sity, is engaged in work 
as a Research Consultant in New York City 


Miss Margaret Jeannm Dix, of the University of California Statistical Labora¬ 
tory, died an accidental death at her home m Berkeley on June 20, 1946 
Mr Albert M Freeman, of the Boston Fiduciary and Research Association, 
died May 20, 1946. 

Dr Walter Schilling, of the Stanford University Hospital, died suddenly m 
San Francisco, December 16, 1946. 


Summer Statistical Session at the University of California at Berkeley 

The important advances in the theory of statistics during the war and espe¬ 
cially the unprecedented growth in the fields of application have created a 
strong demand for trained statisticians to fill both the research and the teaching 
positions all over the country. Since ill many cases the war time education had 
to be somewhat sketchy, unsystematic, and not very conducive to a thorough 
coverage of the vast material, it is felt that a lelativcly brief set of courses on a 
rather advanced level would be beneficial to many persons, both those who al¬ 
ready hold research or teaching positions in statistics, as well as those who 
prepare for higher degrees. 

With this object in mind, the University of California at Berkeley is offering 
a set of statistical courses during the Summer Session, June 23rd to August, 2nd, 
1947 .There will be three couises. (i) General Theory of Random Variables and 
Frequency Distributions, by Harald Cramdr of the University of Stockholm,, 
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(ii) Pioblcms of Testing Hypotheses and of Estimation, by J. Neyman, Univer¬ 
sity of California, Berkeley, and ( 111 ) Seminar Course The last will be given by 
seven scholars, each giving two hours of lectuies, as follows: 


1 

2 . 

3 


4. 

5. 

6 

7 


Statistical Astronomy 

Orthogonal Polynomials and Problems of Moments 
Methods of Calculation 

(a) Gibbs’ Methods in Statistical Mechanics. 

(b) Darwin-Fowlcr Method of Statistics 
Large Scale Sampling Surveys. 

Statistical Problems Arising in Nuclear Physics 
Measurements 

Problems of Population Genetics 

Interactions between Industrial Problems Rnd Mathematical 
Statistics 


J. Trtjmt>ler 

SzEQO 

F. Lenzen 


P C Mahai.anobis 

R. Serber 

S, Emerson 
H. SchepfIs 


The purpose of the Seminar Course is to introduce the students either to 
branches of pure mathematics contingent on mathematical statistics but not 
ordinarily taught in the universities oi to various fields of knowledge offering 
fruitful fields for statistical studies. 


Summer Statistical Session at Virginia Polytechnic Institute 

A Summer Statistical Session will be held at Virginia Polytechnic Institute, 
Blacksburg, Virginia, August 5 to September 5, 1947 This Session will be 
sponsored jointly by Virginia Polytechnic Institute, University of North Caro¬ 
lina, University of Michigan, Iowa State College, and the Federal Bureau of 
Agricultural Economics 

The faculty will consist of, Walter A. Hendricks, B A.E , U S D A , E.enis 
Likert, University of Michigan, H L Lucas, University of North Carolina, 
Maurice G Kendall, England; George W. Snedecor, Iowa State College; Frank 
Yates, Rothamsted Expeirment Station, England, Earl E Houseman, B.A.E , 
U S D A , Raymond J. Jessen, Iowa State College, and Boyd Harshbarger,- 
Virginia Polytechnic Institute 

The following courses will be offered lor credit: Engineering Statistics, Sta¬ 
tistical Methods; Design of Animal Experiments, Schedule Design and Interview 
Techniques for Sample Surveys; Sampling Design and Analysis, Mathematical 
Theory of Sampling, Seminar, Mathematical Statistics, and Experimental 
Design, 

In addition to the faculty, probable Seminar speakers are: W. F Callendar, 
W G Cochran, Miss Gertrude M Cox, W. E Doming, George Gallup, M. FI. 
Hansen, Harold Hotelling, Arnold King, and Charles F, Sarle 

Inquiries legardmg the Summei Session should be addressed to Boyd Harsh- 
barger, Professor of Statistics, Summer Statistical Session, Virginia Polytechnic 
Institute, Blacksburg, Virginia. 
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New Members 

The following persons have been elected to membei ship m the Institute 
(January 1 to February 28, 1947) 

Asofsky, Samuel, B S. (C.C NY) Stat , National Jewish Welfaie Board, 1256 B, is St. 
Brooklyn SO, N Y 

Auer, Richard M„ AM. (Columbia) Insl.r. in Math , Statu Teachers Coll,, Montclaif 
N. J , 88 No. 16 St , East Orange 

Bakan, David, M A (Indiana) Chief Stat , Comm on Selection and Training of Aircraft 
Pilots, National Research Council, 259 Nalatorium, Ohio Slate Vmv , Columbus 10 
Ohio 

Beatty, Glenn H., AB (Ohio State) Grad student and Fellow, Iowa State College, Station 
A, General Delivery, Ames, Iowa 

Campbell, Wallace A., B.S (Columbia) Stat .Analyst, War Assets Administration, 488 
Washington Ave., Brooklyn 16, N. Y. 

Celia, Francis R , MA (Kentucky) Assoc Prof, of Statistics and Director, Bur of Busi¬ 
ness Research, Univ of Oklahoma, Norman, Okla 

Chapman, Douglas G., MA (Toronto) Asst. Prof of Math., Univ of British Columbia, 
Vancouver, Canada 

Cheydleur, Benjamin F., B.A. (Wisconsin) Chief, Mechanized Analysis, Naval Ordnance 
Lab , 602 Avenue E, District Heights, Washington 19, D. C. 

Coombs, Clyde H., Ph.D. (Chicago) Ass’t Prof, of Psychology, and Research Psychologist, 
Institute for Human Adjustment, Univ of Michigan, Ann Arbor, Mich , 1027 E. 
Huron 

Corton, Edward L., Jr., MBA. (Chicago) Grad, student, Iowa State Coll., 80S Hodge 
Ave., Ames, Iowa 

Davis, Harold., A B. (Brooklyn Coll ) Stat, Navy Dept., 416 — 33 St., S E., Washington, 
D.C. 

Dutton, Arthur M., B S.E E (Iowa State) Grad. Fellow, Mathematics Dept., Iowa State 
Coil., Ames, Iowa 

Fay, Edward A., AM (Harvaid) Giad. student, "Univ. of California, Berkeley, 415 South 
17th St , Apt SB, Richmond, Calif. 

Flanagan, John C., Ph.D. (Harvard) Prof, of Psychology, Univ of Pittsburgh, Pitts¬ 
burgh 13, Pa 

Gardner, Eric F., Ed M (Boston Teachers) Teaching Fellow and Milton Fellow, Grad. 
School of Educ , Harvard Univ., Cambridge, Mass , Walker House , Jfi Quincy St. 

Gerende, Lincoln J., C Ph M., U S. Navy, Naval Medical Res Institute, National Naval 
Medical Center , Bethesda 14, Md. 

Grossman, Evelyn, M.A. (Columbia) Stat., U. S Dept, of Agriculture, 6401 — 14 Si., 
N. W , Washington 12, D C. 

Hill, Edwin A., Jr., M A. (Columbia) Instr. in Math., Coll, of the City of N. Y., 50 West 
67 St., New York 23, N. Y 

Horton, H. Burke, M.B.A. (Texas) Senior Tiansport Analyst, 2906 Naylor Rd., S. E., 
Washington 20, D C. 

Horvltz, Daniel G., B.S (Mass. State) Grad, student, Iowa State Coll., 2131 Country Club 
Blvd., Ames, Iowa 

Ikhtlar-ul-Mulk, S. M., M.A (Punjab, India) Grad, student, Princeton Univ., Graduate 
College, Princeton, N. J. 

Jaeger, Carol M„ BA (Dubuque) Statistician, 1300 Columbia Teirace, Peoria 5, 111. 

Jessen, Raymond J., Ph.D. (Iowa State) Res. Assoc. Prof , Iowa State College, and 
Agnc Statistician, U S.D.A., Statistical Lab., Iowa State Coll , Ames, Iowa 

Klnzer, Mrs. Lydia Greene, M A. (Kansas) Ass’t Instr, m Math , Ohio State Univ,, 
585 East Town Street, Columbus 15, Ohio 
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Langenicp, Carl E., M S. (Iowa State) Instr. in Math , Iowa State Coll , Apt 3, Cranford 
Annex, Ames, Iowa 

Lowy, Melitta E., A.B. (Hunter) Statistician, Grad, student, Columbia Umv., 645 West 
End Ave , New York 25, N Y. 

Mattlla, Sakarl, Fil Mag. (Helsinki) High School of Commerce, Helsinki, Finland 

Mayerson, Allen L., B S (Michigan) Grad, student and Teaching Fellow, Umv. of Mich , 
ISOS Packard Si., Ann Arbor, Mich. 

McCreary, Garnet E., M.A. (Queen’s Umv.) Research Fellow, Statistical Lab , Iowa 
State Coll., Ames, Iowa 

McMillan, Olan T., MA (Michigan) Instr. in Math., Michigan State Coll., East Lansing, 
Mich. 

Morris, Edward B., A.B. (Indiana) Statistician, U. S. Bur. of Labor Statistics, 1915 Ridge 
Place S E., Washington SO, D C. 

Moshman, Jack, BA (New York) Tutor in Math., Queens Coll., Flushing, N. Y., 125-09 
Liberty Ave., Richmond Hill 19 

Natrella, Mrs. Mary G., B.A (Pennsylvania) Statistician, Bureau of Ships, Navy Dept., 
1210—ISih St , N W. Washington E, D C. 

Neal, T. Ellison, A B (Geo Washington) Statistician, Textile Dev Dept., U. S. Rubber 
Co., Hogansville, Ga. 

Noble, Carl E., Ph D. (Iowa) Quality Methods Engineer, Kimberly Clark Corp , Lake- 
view Mill, Neenah, Wis. 

Ostle, Bernard, M A. (British Columbia) Teaching Ass’t, School of Bus. Adm., Univ. of 
Minnesota, Minneapolis, Minn. 

Oxtoby, Toby E., B.A. (Iowa) Grad Asa’t, Dept of Psychology, State Umv. of Iowa, 
Iowa City, Iowa 

Pelsakoff, Melvin P., Student, Princeton Umv., 34 North West College, Princeton, N. J. 

Rothschild, Colette, (Ecole Normale Superieure) Attachee de Recherches au Centre Na¬ 
tional de la Recherche Scientifique, 43 rue Madame, Paris VI’, France 

Slonlm, Morris J., M.B A. (Harvard) Statistician, Bureau of Labor Statistics, 210 Wayne 
Place S. E , Washington SO, D C. 

Soler, Reuben I., B.B A (C C.N.Y.) Statistician, Food and Drug Administration, SIfi 
Portland St , S E , Washington, D. C. 

Stouffer, Samuel A., Ph.D. (Chicago) Prof, of Sociology and Director of the Laboratory 
of Social Relations, Emerson Hall, Harvard Umv., Cambridge, Mass. 

Telcher, Henry, B.A. (Iowa) Graduate student, Columbia Umv., 139 Osborne Terrace, 
Newark, N. J. 

Tiedeman, David V., M.A (Rochester) Instr. in Educ., Grad School of Educ , Harvard 
Univ , Walker House, 40 Quincy St., Cambridge 38, Mass. 

Tlntner, Gerhard, Ph.D (Vienna) Prof, of Economics and Mathematics, Iowa State 
Coll , Ames, Iowa 

Weiss, Eleanor S., Ed M. (Boston Teachers) Teaching Fellow, Grad School of Educ., 
Harvard Umv., S005 Commonwealth Ave , Brighton SB, Mass. 

Wilson, William A., Jr., AB (California) Teaohmg Asa’t in Psychology, Univ. of Calif., 
Berkeley 4, Calif 

Woodell, Allan D., A.B. (N. Y. State Teaohers, Albany) Graduate student in math., Univ. 
of Mich., 425 Church St., Ann Arbor, Mich. 


Omitted from 1946 lists of new members' 

Feraud, Prof. Lucien, Faculte des Sciences Economiques et Sociales, Univ. de Geneve, 
24 rue Henri Mussard, Geneve, Switzerland 



REPORT ON THE ATLANTIC CITY MEETING OF THE INSTITUTE 

The Ninth Annual Meeting of the Institute of Mathematical Statistics ivas 
held at Atlantic City, New Jersey, on Friday and Saturday, January 24-25, 1947. 
The meeting was held in conjunction with meetings of the American Economic 
Association, American Statistical Association, and the Econometric Society. 
The following 154 members of the Institute attended the meeting. 

Beatrice Aitchison, F L Alt, R L Anderson, T W Anderson, K. J Arrow, Max Astra- 
chan, B. M Bennett, Joseph Berkson, A J. Berman, C. I Bliss, Paul Bosehan, A E, 
Brandt, M F Bresnahan, Philip Brown, O P. Bruno, It. W Burgess, 0 K. Burns, B. H 
Camp, F It. Celia, Uttam Chand, K. L Chung, C. W. Churchman, P C. Clifford, W. J. 
Cobb, W G. Cochran, F. G. Cornell, D R. Cowan, Ilaiald CramcSr, J II Curtiss, J F Daly, 
G B Dantzig, D G. Delhi, D. B. DeLury, B. W Dempsey, II F. Dorn, F W Dresch, 
A. J. Duncan, David Durand, P S. Dwyer, Chuichill Eisenhart, W, D. Evans, Will Feller, 
C. D. Petris, Irving Fisher, L. R. Frankel, M A Geisler, Leon Gilford, M. A Girshick, 

C. H. Graves, IC E Greene, S W. Greenhouse, F. E. Grubbs, E. T. Gumbel, Margaret 
Gurney, Louis Guttman, Trygve Haavelmo, K. W Halbert, M H Hansen, Miriam S. 
Harold, T E Harris, Boyd Harshbarger, Bernard Hecht, Wassily Hoeffding, II. B Horton, 
Harold Hotelling, E. E. Houseman, Helen M Humes, Leonid Hurwicz, Seymour Jablon, 
R. W. Janies, R J. Jessen, H L Jones, Alice S. ICaitz, H. B Kaitz, L S. Kellogg, II. S 
Komju, Tjalling Koopmans, C. F Kossack, R. L. Kozolka, D. II. Leavens, Howard Levene, 
J E. Lieberman, Ronsis Likert, S B Litlauei, Irving Lorge, P J McCarthy, P W. Mc- 
Gann, F. E. McIntyre, H. F, MacNeish, J D. Maddrill, Jacob Marschak, Max Millikan, 

A. M, Mood, Mrs Margaret Moore, J. W Morse, J. E Morton, Frederick Mosteller, D. N. 
Nanda, P M, Neurath, Jerzy Ncyman, M L Nordon, Nilnn Norris, IT, W Norton, P S 
Olmstead, E G. Olds, Sophie Rakesky, Chester Rapkin, Olav Reicrsol, W A. Reynolds, 
P. R. Rider, C F Roos, A. C. Rosander, Ernest Itubm, Herman Rubin, P. J. Rulon, Frank 
Saidel,MarionM. Sandomirc, Max Sasuly, F. E Satterthwaite, E. D. Schell, E M Schrock, 

D. II Schwartz, G R Seth, L W. Shaw, W A Shewlmrt, J. II. Smith, R. T Smith, Leslie 
E Simon, Milton Sobel, C M. Stem, G. T Steinbeig, Joseph Steinberg, II. W Stemhaus, 
F.F Stephan, A P. Stergion,M. S Stevens, G. J. Stiglor, S A. Stouffer, Zenon Szatrowsla, 

B. J Teppmg, J. W Tukey, D F Yotaw, Ji , Helen M Walker, J. II Watkins, Louis 
Weiner, Samuel Weiss, S S. Wilks, Elizabeth W. Wilson, C P. Winsor, J Wolfowitz, M A 
Woodbury, Holbrook Working, C A, Wright, and T O, Yntema 

The first session, a joint session with the Econometric Society and the Bio¬ 
metrics Section of the American Statistical Association, was held at two o’clock 
on Friday afternoon, and was devoted to the topic, Applications of Statistical 
Techniques to Agricultural Economics. Holbrook Working of Stanford Uni¬ 
versity presided. The following four papers were presented: 

1. Use of Variance Components m the Analysis of Market Differentials m Hog Prices 
R. L Anderson, Univeisity of North Carolina, 

2 An Application of the Analysis of Variance in the Economic Evaluation of Production . 
Boyd Harshbarger, Virginia Polytechnic Institute 

3 A Model of the Economic Intel dependence between Agricultui e and the National Economy. 
Trygve Haavelmo, Cowles Commission foi Research in Economics. 

4. I he Reduced-Form Method foi Estimating Simultaneous Economic Relationships 
M. A, Girschick, Bureau of the Census. 
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The session concluded with a discussion of these papers by T W. Anderson, 
Columbia University; Milton Fnedman, University of Chicago, and, Harold 
Hotelling, University of North Carolina 
At 8 o’clock on Friday evening there was a joint session with the Econometric 
Society and the American Statistical Association, on the topic, When is the 
Analysis of Variance Useful in Economic Research? Arthur R Tebbutt of 
Northwestern University presided, and the following three papeis were presented: 

1 The Advantages of the Analysis of Variance for Reseat ch and Managerial Control 
Put poses Harry Pelle Hart,kerneler, Univeiai ty of Missouri. 

2 Estimation of Economic Relationships and Multivariate Regression 
Leonid Huiwicz, Iowa State College 

3 Nonslandaid Forms of Variance Analysis . 

W Allen Wallis, University of Chicago 

There was discussion of these papers by Tjailing Koopmans, Cowles Commission 
for Research in Economics: Gerhard Tintner, Iowa State College, and, J. W. 
Tukey, Princeton University. 

At 10 o’clock on Saturday morning theie was a joint session with the American 
Statistical Association devoted to the topic, Use of Ordered Observations in 
Statistical Analysis, with Harold Hotelling of the University of North Carolina 
as chairman The following two papers were presented- 

1. Estimation of Parameters by Use of Order Statistics 
Frederick Mostellcr, Harvard UmveiBifcy 
2 Toleiance Limits. 

Jacob Wolfowitz, Columbia Umveisity. / 

There was discussion of these papers by John H Smith, Bureau of Labor Sta¬ 
tistics, Howard L. Jones, Illinois Bell Telephone Company; and J W. Tukey, 
Princeton University 

At the Saturday morning session one contributed paper of the Institute of 
Mathematical Statistics was also presented, by E J. Gumbel, Newark College 
of Engineering, on the topic The Asymptotic Distribution of the Range. 

The Institute’s session at 2 o’clock Saturday afternoon was devoted to con¬ 
tributed papers W. G Cochran, president of the Institute, presided, and the 
following four papers were presented 

1 A Test of Significance of the Coefficient of Rank Coi relation for More than Thu ty Ranked 
Items. 

Nilan Nome, Iluntei College 

2 A Generalized T Measure of Multivariate Dispeision. 

Harold Hotelling, Univoisity of Noith Carolina 

3 Asymptotic Piuperties of Maximum and Quasi-Maximum Likelihood Estimates 
Herman Rubin, Cowles Commission for Reseaich in Economics. 

4 The Corner Test for Association 

J W Tukey, Princeton University, and Paul Olmstead, Bell Telephone Laboratories. 
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Abstracts of these papers appear elsewhere in this issue. 

Following the session on contributed papers, Professor Jerzy Ncyman of the 
University of California gave an invited address on the topic: On Consistent 
Estimates, with Particular Reference to Structural Relations between Several Vari- 
ables all Subject to Random Error A discussion of this address followed, by 
Miss E. L. Scott, University of California; A. Wald, Columbia University; and 
Tjalling ICoopmans, Cowles Commission for Research in Economics. 

The meeting closed with the annual business meeting of the Institute, which 
was held at 5 p.m. on Saturday in Pladdon Hall. Reports by the President, 
Secretary-Treasurer, and Editor were followed by the election of officers for 
1947: Will Feller, President, Morris H. Hanson and John H. Curtiss, Vice- 
Presidents, and Paul S. Dwyer, Secretary-Treasurer. 

P. S. Dwyer, 
Secretary. 
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Introduction. If n real variables , x 2 , • • , x r are subject to a probability 
distribution with the element dVi[xi)dVi{Xi) ■ ■ dV n (x„) one can ask for the 
distribution of any function / of Xj , x 2 , • ■ ■ x n We are primarily interested in 
statistical functions, i.e. in functions that depend on the repartition 1 &(») of the 
n quantities xi , x 2 , • • • x„ only The simplest case is that of the linear statis¬ 
tical functions 


(1) 


/ = J dS n (x) = ^ [lf/(xd + ^(®j) H-+ 


The so-called Central Limit Theorem of Probability Calculus states that the 
distribution of a linear statistical function, if n tends to infinity, approaches 
more and more the normal (Gauss) distribution if some very general conditions 
linking fix) and the VJx) are fulfilled. It has been shown, ten years ago, [2] 
that the restriction to linear functions here is immaterial. Much more general 


1 The function S n (%) is called the repartition of the real quantities Xi, x 2 , • • , x, if 
nt S n (z) is the numbei of iho'-c among rbo ri , , • , x n that are smaller than or equal to x. 
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statistical functions tend towards normalcy with increasing n, for example the 
variance of mth order 

(2) / = = J (x — a) m dS n (x), a = J xdS„(x ) 

and, likewise, such combinations as the Lexis quotient M s /o( 1 — a/N) or Gini’s 

disparity measure 1 — J (1 — S,f) 2 dx/a or, in the multidimensional case, the 

correlation coefficient, etc. On the other hand, statistical functions are known 
whose distributions assume, asymptotically, a form different from the Gaussian. 
One example is Pearson’s Chi-square, another the test function w 2 , introduced 
by H. Cramdr [1] and the author [4]: 

(3) / = w 2 = / ff'faOI&f*) - f n {x)fdx 
where g'(x) > 0 and 

(4) V n (x) = - [V x (x) + Vi(x) + • • • + V n (x)] 

n 

N. V. Smirnoff [7, 8] computed the asymptotic distribution of cfi for the case 
that all F»(x) and, therefore, V„ (x) equal one and the same distribution func¬ 
tion V(x). The result differs widely from the Gaussian distribution 

In order to understand all this it is necessary to consider / as a function de¬ 
fined in the space of distributions V (x) (or in a sub-space of it). Then, the van- 
able / whose distribution is sought is the value of f{V(x) } at the "point” > S„(x) 
and should he written as f\S n (x) ). Such "functions of functions” ivere first 
introduced by Vito Volterra (1887) and are today a familiar topic of higher 
analysis, The first statement that can be made is that the asymptotic dis¬ 
tribution of /(iSfl(x) j depends mainly on the behavior of f{V(x)} at the point 
V„(x) defined by (4). 

Volterra also introduced the notion of derivatives and of Taylor development 
for a “fonction de ligne.” Using these concepts a more specific statement can 
be pronounced: The type of asymptotic distribution of a differentiable statistical 
function f[S„{x)} depends on which is the first non-vamshmg term m the Taylor 
development o//{F(x)} at the point F„(x), if it is the linear term the limiting dis¬ 
tribution is normal, under restrictions that can easily be derived from the Central 
Limit Theorem,-, in other cases higher types of asymptotic distributions result. 

The present paper tries to establish this theorem and to furnish preliminary 
information about the asymptotic distribution of the second type 

If both the function/(V(x)) and the sequence of distributions Vi(x), Vi(x), 
V 3 (x), ■ ■ ■ are defined independently of each other, it cannot be presumed that 
the derivative of / vanishes at V„(x). In this sense the normal distribution ap¬ 
pears as the “general case” of an asymptotic distribution while the higher types 
represent certain “singularities ” In the case of type m, (m = 1, 2, 3, • •), 
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the distribution of the expression 

(5) n ml 2 [f{S n (x)\ ~ f{V n (x) )] 

tends towards a function of bounded mean value and variance. For m = 1 
it is a Gauss function with mean value 0 and finite variance For any uneven 
m the distribution is symmetrical rvith respect to the zero point If / is given, 
the limiting distribution is essentially determined if in addition to F n (a:) one func¬ 
tion of two variables, U n (x, y), is known, 

Unix, y) = \ £ [VXx) - V.to)V,(y)\, (x ^ y) 

7b r=l 

<® . . 

- - E ll'.W - VMV.l,)], (xiy). 

71 v=l 

For instance, in the case of the linear function (m = 1) defined in eq. (1), the 
(second order) variance of (5) is found as the Stieltjes integral 

(7) / mm dU„(x, y) 


and no mean values of higher order are required for computing the moments of 
any order, whatever m is 

For m — 2 the complete expression for the characteristic function of the asymp¬ 
totic distribution of (5) is developed in Part III of this paper. It has the form 


( 8 ) 


1 

D{ui ) 


where D{\) is m general the Fredholm determinant of a symmetrical kernel that 
depends on the second derivative of f{V{x )} at V = V n , on V„ and on V n . 
If the F„(ai) are discontinuous distributions with saltus at k distinct points only, 
D is the determinant of a quadratic form of k variables. This happens to be 
the case with Pearson’s x while the cu 2 distribution found by Smirnoff represents 
a fairly general case of the asymptotic distribution of second type. 


PART I. PRELIMINARY THEOREMS 

1. Asymptotically equal distributions. Let K x , K 2 , Kt, •• • be an infinite 
sequence of collectives, fc„ the number of variables in K n and A n , B n two func¬ 
tions of these variables, (n = 1, 2, 3, ) The cumulative distribution func¬ 

tions of A n and B n will be denoted by P„(x) and Q„(x) respectively, i.e. 

(1) P n (x) = Prob {A n ^ z}, Q n (x) = Prob {B n ^ a;} 
and the expectation of | A n — B n \ by 

(2) En{ I An - Bn | } 

all these quantities being taken with respect to the distribution m K„ . 
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Two functions F n (x ) and G n (x) both depending on the parameter n are said 
to be asymptotically equal if 

(3) lira | F n (x) — G n (x) | = 0 uniformly in. x. 

n =oo 

If this is the case for the cumulative distribution functions P n (x ) and QJ,x) of 
A n and B n we shall also say that A n and B n have the same asymptotic distribu¬ 
tion. Eq (3) will also be written as F n (x ) ~ G n (x). The following can be 
proved: 

Lemma A. If with increasing n the expectation of the absolute difference be- 
tiveen An and B n tends towards zero and if one of the functions P n {x) or Q„(x) is 
asymptotically equal to a function F„(x ) that has a uniformly bounded derivative, 

i e. 

(4) Jim B n {\A„ - B n |1 = 0, dFffxf < M fQr ^ 

ti=oo CtX 

then A n and B n have the same asymptotic distribution. 

This statement, m a slightly different wording, was proved in an earlier paper 
[2] and the proof will not be repeated here. If one of the various definitions for 
“stochastical convergence” is used, one can also say that A n and B n , under the 
stated conditions, converge stochastically towards each other. 

The Lemma A can be extended and modified in various ways. First, it is 
obvious that the expectation of | A n — B n | can be replaced by that of any 
positive power \ A n — B„\ h . With respect to F n one could ask for the existence 
of a bounded derivative in all points except for a zero set only. Then P„ and 
Q„ would still converge everywhere except for this zero set and the definition 
of asymptotically equal distributions could be extended to this case. In the 
present paper this will not be done as it is not our purpose to strive for results 
of the possibly greatest generality. 

2. Special class of statistical functions: quantics. Preliminary to the study 
of general statistical functions a special class which corresponds to quantics 
(homogeneous polynomials) of mt.h order must be discussed. Let Vi(x), V 2 (xf, 
Vb(%), •' be the cumulative distribution functions in a sequence of one-dimen¬ 
sional collectives Gi , C 2 , C s , • • and S n (x) the repartition of a sample drawn 
from the n-dimensional collective K n , with the distribution element 

dVi(x 1 )dV l (x 2 ) ••• dV n (Xn). 

We introduce 

Tjx) = S n (x) - 7„(x), V n (x) = - £ V r (x). 

n v-i 


(5) 
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Here, nT n (x ) is obviously the excess of observed values g x over their expected 
number Qualities of first, second, third, ■ • ■ order are then defined as 

/i{£„(>)} = f \p(x)dT n (x) 

( 6 ) /»{&(*)} - If f(x, y)dT n (x)dT n (y) 

f 3 {S n (x)\ = J J f Mx, V, z) dT n {x ) d,T n (y) dT n (z) 

all integrals to be extended over the total range of x. Of course, only such \p 
for which the respective integral exists are admitted. The first, fa , is obviously 
a linear statistical function and the asymptotic distribution of \/nfi is, under 
well-known conditions, a Gauss function with the mean value zero and the 
variance given in eq. (7) of the Introduction. In f 2 , f 3 , ■ ■ the f may be 
supposed to be symmetrical with respect to their variables. It will be seen 
later (Part II, sec. 2) that the first derivative of / 2 , the first and second deriva¬ 
tives of fi , etc vanish at the point 7 n (s) • 

All the above functions fi , , /,, can be considered (if the i p are continu¬ 

ous) as the limits of ordinary quantics in k variables Choose k disjoint inter¬ 
vals h, h , , h on the x-axis, and call Ik-n their complement Denote the 

increment of V v (x) within I K by p tK and the increment of S n (x) by p m . Ob¬ 
viously is the probability, within C v , of x falling in the interval h and n Pnt 
is the number of observed sample values m the same interval. We introduce 
the excess values ; 

1 vb 

(0 Pnx Pnk , Ptik = - y ) Pvk 

Ti p=i 

and form the sums 

k 1 ■ k Ik 

(8) fi = 22 lA* £* > h — 2 'PnX £*h , h = £ • . 

K=1 K,\ 

By selecting suitable sets of intervals Ii, I 2 , • • , It and appropriate values 
for the constants , • • , one can approximate the integrals ( 6 ) by sums 

of the form (8). 

Our next task will be to find asymptotic values for the expectation and for the 
moments of the quantities defined in ( 8 ) Clearly a formula for the expectation 
of a power product £$£3 where a, (3, 7 , ■ • • are positive integers, is the 
only thing we need. To arrive at such a formula we replace each of the one¬ 
dimensional collectives C y by a /c-dimensional Ct in the following way. 

In C„ the chance variable is a /c-dimensional vector which can take (k -j- 1) 
distinct values only it can be zero or coincide with the unit vector parallel to 
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one of the h axes. To the latter values of the variable we assign the probabilities 
, p,i , ■ ■ ■ , and to the zero the probability 

(9) Pv,fc+1 = 1 — Prt — p„a - ■ • ■ — 

This quantity, of course, may vanish, The mean value of (7* is the point with 
the coordinates p„i , p„s, • ■ , p,t. 

If the n collectives C* , C* , ■ ■ , C* are combined, the sum of the n observed 
vector values is a vector with the components np,a , np „ 2 , ■ ■ , np nlu . If | n 
each C* the origin is shifted to the mean value and the coordinates with respect 
to the new origin are called z\ , z 2 , • • , 2 a. , the sums of the observed Z\,z%, • • , 
zrvalues will be »&,«£•*,••• , n£ k rather than np nl , np n2 , ■ ■ ■ , np nlt . Thus 
it is seen that all questions concerning the distributions of £i, £ 2 , • can 

be answered on the basis of the well-known rules on the addition of n independent 
chance variables. This leads to the symbolic formula for the expectation: 

(10) E n {{nQ a {n^YW ■ ■ } = (g Z^j (j2 (Z zS • • , 

where on the right-hand side each term 

(11) Zh&Zh--' 


has to be replaced by 


(IT) / &U • ■ dV*{z). 

Here, obviously, V*(z) is the distribution function in C* and the expressions 
(IT) are in fact sums of (k + 1) terms, for example 


( 12 ) 


J ZiZidV*(z) = p n ( 1 - p»i)(-p» 2 ) + p,i(~p,i)(l — p»a) 


Ai-f-l 

+ Z Pn( —Pn)(~Pv2) = -Pv\Pv 2. 

1=3 


It will be seen in the next section that only very few of these sums are needed 
for computing the asymptotic value of (10). Note that the value of (IT) can 
be expressed in terms of p„i,, p, 2 , p,s, ■ ■ • alone if £1, £2, £3, ■ ■ • only appear in 
the product. 


3. Asymptotic expectation of excess-power products. We first consider the 
case where the sum of exponents a, /3, y, • • • is an even number 

(13) a + |3 + 7 + • • • = 2m. 

On the right-hancl side of (10) stands a sum of n~ m terms, each a product of 2m 
factors Z,„ . It follows from (11') that the absolute value of a product cannot 
surpass 1. The second subscripts are the same m each term: first a ones, then 
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f3 twos, 7 threes, etc. The first subscripts are m each term a combination of 
2 m digits out of v = 1, 2, 3, ■ ■ • , n. The number of those combinations which 
include s different r-values, (s = 1 , 2 , ■ • 2 m), is 


(14) 




- l) 2m + 


+ 


s — 1 


Obviously, the Ii[ m) are bounded (independent of n) 

If s > m the combination of first subscripts must include at least one v-value 
that appears only once. All those products vanish since 


(15) 


J z K dV*(z.) 


= 0 for all k, v 


due to the fact that the origin in the z-space coincides with the mean value of 
the distribution V v (z). Note that 


(16) 



(s < m) 


(s = m). 


It follows that the sum of all terms in (10) that correspond to any s < m are 
of the order o(n m ) or smaller. 

Thus, we arrive at an asymptotic expression for E n by dividing both sides of 
( 10 ) by n m : 

(17) E n {£ & £ '' •} ~ A E (II Z VK ) 

fO K v 


where only such products on the right-hand side are retained which include 
exactly m different v-values each appeanng twice. 

In analogy to (12) we compute 

I Z t Z K dV y (,z) = PviPvk (l k) 

(18) J 

= p yi { 1 — p yy ) (I = k) 


and write, for the sake of abbreviation 

(19) Pi? = p Vi S iK - p yi p yy = Pj? 


with the usual meaning of (= 0 if t ^ k and = 1 if t = k). Then the sum 
to the right in (17) includes ( 2 m!)/ 2 m terms, each a product of m factors P[" y . 
If each of the m couples i, k consists of two different figures, the respective prod¬ 
uct appears a! S 1 7 1 • • times; if r couples are doubles (i = k) the multiplicity 
of the term is 2 -r cc! (3 1 7 I ■ • ■ . Therefore, (17) takes the form 


( 20 ) 


‘JMfi 




7 ! 


n 


— £ 2 -'KilPl? 


. jAv-m) 
* L m K m 
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In this sum the upper indices are any set of m digits out of 1, 2, 3, ■ ■ ■ , n 
and the subscripts are all sets of in couples including a ones, /3 twos, y threes 
etc. To each such set of m couples belong (,’!) terms of the sum. The number 
of sets of couples is bounded (independent of n). The exponent r is the number 
of doubles (i = k) among the m pairs. 

The expression (20) admits of a transformation which renders it much more 
suitable Assume that a set of couples i, k has been chosen according to the 
conditions and consider the product 



Among the n m terms which we obtain by developing (21) are all terms appearing 
in the sum (20), each of them repeated ml times and, m addition, 

(22) n m — (,n)m! = n m — n(n — 1)(» — 2) ■ • (n — m + 1) 

other products of m factors P. Since the difference (22) divided by n m goes to 
zero with increasing n and each | P | is smaller than 1, the additional terms 
have no importance. We therefore introduce the quantities 

(23) = - x; p, ( ? = K-t pn - - ib p* . 

71 Pal l»al 71 p=al 


Then (20) can be written as 
(24) •••} 

mi 


E2- f P lltl P l2 » a 


... P 


l m K m 


Here we have a sum of a finite number of terms It will be supposed in all that 
follows that the P lK as defined m (23) do not vanish identically as n increases in¬ 
definitely 

Since in the sum (24) no upper indices appear, equal terms repeat themselves. 
We can, therefore, rearrange it, using the polynomial coefficients and absorbing 
at the same time the factor 2 _r . The final form of (24) is given in the following 
Lemma Bj, which also includes a statement for the case of an uneven sum of 
exponents a + /3 + y -j- ■ ■ ■ . In fact, it is easily seen that if again half the 
sum is called in, no group of terms on the right-hand side of (10) exists that 
would supply a finite limit when divided by n m . Thus we arrive at 

Lemma B x . If ?i£ k is the numerical excess of observed over expected quantities 
falling m the interval I K , the asymptotic expectation of the excess-power product 
£“ £2 £7 • • • is given by 

(a/ n) a+ ^ +y+ ' U n (£id£7 ■ • ■} ~ 0 if a + j3 + 7 + • • • uneven 
(25) ~ Z-t T 1 - (i?n■ • • P^Kfi 13 ■ ■ , 

a■ (Til Hr 22 1 ’ ' O 12 1 * * 


if a. + p + 7 + • • even 
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the sum to he extended over all sets of non-negative integers a n , an, • • • , an , 
that fulfill the conditions 

(250 °Tl = — °12 — dl3 — ■ • '), 022 = 4((3 — (721 — 023 — •••))’•• 

The P, t as defined m (23) depend on two groups of mean values only, namely on 

(25") P«- = - E Pv* and p,p. = - E p^p™. 

7b v™l 7b v=l 

Some properties of the matrix P, K will be discussed in the next Section, 

For practical computation, instead of (25), a recursion formula may be used 
which follows immediately from (24). Writing simply (a, p,y , ■ ■ ■) for the sum 
in (24) the formula reads 

(a, P,7, 1 ■) = a(« ~ 2, 13, y, • • -)^ii + !(a, P - 2, y, • • ‘)P 22 + • • ■ 

(26) 

+ (a — 1) P ~ 1) 7» • • )Pn + (&, P ~ 1, 7 ~ 1), • • ‘)Pm + 1 ■ •• 

If all the original distributions V,(x) are equal, this recursion formula, and from 
it (25), can be derived almost immediately from the theorem on the multiplica¬ 
tion of characteristic functions with the addition of chance variables 
Note that the expectation of the product is P LK /n for any value of n. 


4. Asymptotic expectation and variance of quantics. We first state a char¬ 
acteristic property of the expression (25) for the expectation of an excess power 
product. Let us denote by C a ,$, y ,.. the right-hand side of (25) in the case 
of even a + P + 7 + Then, if C a ,g , 7 . is expressed in terms of P lt and each 

time the subscript 2 is changed into 1, we arrive at the value of C a +g,«, y , ■ 

This would not be the case if C a ,p, Y , . were expressed in terms of p,, since e.g. 

Cll = Pn = pi — PlPl, Cl2 = Pi 2 = — p!p 2 ■ 

In order to prove the statement we observe that the C a ,g, y , can be derived 
from the coefficients in the development of the mth power of a quadric: 


(27) 


(* E PM) m =mi Z) ?;*?■ tuU: 


Q, !-y I 


It follows that 


(27') 


C 




J_ 9 2 ™ 
ml dti dtldlf 


I,* 


If m, the subscripts of P„ the ones and twos are identified, the quadric becomes 
a function of t\ + f 2 , U , U , ■ ■ -l and the derivative with respect to 9 t“ 9fa equals 
the derivative with respect to 9 tT On the other hand, the latter derivative 
corresponds to the value of G a +g,o, y , ■ m the form (27') 

Taking m=2, a=P=y = $~ l,eq. (25) supplies 

(28) A ~ PA + PA + PA . 
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According to the above statement this is correct whether t, k, A, y are or are not 
different from each other. Thus, if is a symmetric set of constants, we 
have 

(280 ■ n 2 E n { £ ~3Z 


In general, the numerical factor to the right, i.e. the number of sets of couples 
drawn from 2m figures, is (2m) 1/2”‘m! = 1-3 • • • (2m — 1). Thus we can 
state: 

Lemma B 2 . If a quantic is defined according to (8) with symmetric coeffi¬ 
cients, its asymptotic expectation is given by 

(29) n m E n {ffi m ] ~ 1.3.5 • • ■ (2m - 1)£ ' • ■ Km. ■ 

Before applying this to the continuous case defined in (6), let us consider some 
characteristic properties of the matrix P lK . According to the definition (19) 
of Pf? we have 

(30) X? P[? U, = £ p« l* ~ (j2 p» uf 

‘=1 \ l —1 / 


and using (9) one easily derives from Schwarz' inequality 

^ p>,k+i2p ri t] . 


Since F„ is the arithmetical mean of the P{? it follows that the matrix Pf 
is at least semi-definite and is positive definite except when all p„,n +l = O' 
In the latter case (if e g. the 1c intervals cover the whole x-axis) one has 


(31) 


l'“k ^ ^ -n ~ l / h \ 2 | v 

^ik = ~~ Pvi ( Pvl ) = pv,A.4-l(l 1) “ 0 

M ¥1 v—1 L t,=3 l \t=»l / _J V/ 


which shows that here the reciprocal matrix F* does not exist. 

In the “complete” case, that is, with all p f ,k+i = 0, the elements in each 
horizontal or vertical line of the matrix P u have the sum zero. It follows that 
the k homogenous equations 2 P^x, = 0 have the solution X\ = x 2 = • • • = xh 
and, therefore, that the cofactors of all elements of F,, have one and the same 
value. For each single v the determinant of P["f can be computed: 



— PvlPrt - PrhPn,L +1 


If this is applied to the principal minors of the same determinant in the case 
p,,k +1 = 0, one finds the characteristic equation of the matrix P \? to be 

I S lf - XPl’J | = - £ [(1 - Xpa)( 1 - Xprt) ••• (1 ~ Xprt)]. 

This shows that (k — 1) characteristic roots separate the abscissas 1 /p,i, 
\/p,i , • • ■ , l/p,k (one root being zero). 

The number fc of intervals has nothing to do with the preceding argument 
leading to the eqs. (25) to (28). Also can the entire computation be repeated 
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in terms of dT n (xi) t dT n {x 2), dT n (xa) , * • ^instead of £1 , §2 , £3 , • • ■ if appro¬ 
priate differentials are substituted for the P llt . To find the latter ones we note 
that p vt stands for the increment dV v {%). Thus, using 5(a=, y ) m analogy to 
S u (= 1 for x = y and = 0 for x ^ y) we set 

dU,(x, y) = 8(x, y) dV ,{x) - dV t {x) dV F (y) 

(32) = 8{x, y ) dVv(x) - dW,(x, y) 


(,x ^ y) 
{x ^ y ). 

Then P„ has to be replaced by 

(34) dU n (x, y) = - £ dtPGs, y) = 3(^,2/) df»(z) - dTT„(a:, y). 

71 i»=l 


which is equivalent to the definition of a function of 2 variables: 
U y (x,y) = V r (x) - V r (x)V r {ij) = V,(x) - W„(x, y) 

= Vviy) - V„(x)V,(y) = V,(y) - W,(*» 2/) 


(33) 


This dU n (x, y) is the expectation of dT n (x) dT n (y)/n. 
The function 


1 n 

(35) ' U n (x, y) = “ H Uv(x, y) 

' /*' F-il 

/ 

is the difference of two cumulative distribution functions, one corresponding to 
a distribution along the straight line x = y with the element dV n (x) and an¬ 
other distribution over the whole plane with the element 


(350 


dW n (x,y) = - jldV,(x) dVviy). 

71 p «=1 


To each one-dimensional distribution V v (x) belongs one “distribution excess” 
TJ,{%, y ) as defined in (33). The P[? are the increments of U,(x, y) within 
the product interval dxdy. It is seen from the preceding argument that the 
asymptotic moments of any quantic (6) or (8) depend only on the average U n 
of the distribution excesses U v . 

If a quantic is defined by (6) and the integrals on both sides exist, the asymp¬ 
totic expectation of / 2m may be written m formal analogy to (29) as 

n n En{fim} ~ 1.3.5 ■ ■ • (2m - 1) //••■/ ,X2, ,x 2m ) 

X dU n (Xl , X 2 ) dTJ n (Xi , Xi) ■ • • dU n (x2m-l ,X‘im) • 

This formula is identical with (29) if ^ has constant values in a finite number 
of intervals and vanishes outside these intervals. But it will be seen in the next 
section that (36) can be used in more general cases also. 

For the sake of practical computation one may develop the righthand side 
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of (36) into terms explicitly depending on the given averages V n (x) and W„(x, y), 
For example, in the case m = 3: 

ra’^ n {/s} ~ 1.3,5 Jj'j' [\p{xi ,Xi,X 2 ,x 2 ,x a , x s ) dV n (xi) df n (x 2 ) d? n (x s ) 

(37) ~ Xl ’ Xl ’ x ‘ z ’ X2 ’ Xa ’ dVrXxi) dt n (x 2 ) dW n (x a , x 3 ) 

+ 3<p(xi , Xi , x 2 , Xs , Xl , x&) d,V n (x 1 ) dW n (xo, x s ) dW n (x 4 , Xu) 

- >p(x 1 , X 2 ,X 3 ,Xi,X S , Xe) dW n {Xi , Xi) dWnixg , Xl) dW n {x bl Xi)\ 

In the general case, the numerical factors in the m-tuple integral are the binomial 
coefficients of order m. 

The higher moments of quantics f m can be computed in the same way as 
E n {f m } since any power of f m is a quantic again The formulas, however, be¬ 
come more involved since the coefficients of f,‘ n are not immediately given in a 
symmetric form. It will suffice to show here how the (second order) variance 
of fi can be found. The second moment is the expectation of 

(39) i2 = //// +&> y M z ’ u) dTn( ~ x) dT ”^ dT »® dT '*(«>■ 

Applying here eq. (28) we have 

n 2 E n {fl] V)^{z, u)[dU n (x, y) dU n (z, u) 

(40) _ . 

+ dU n (x, z) dU n {y, u) + dU n (x, u) dU n (y, 2 )]. 

The first term in the brackets leads to the square of n E n {f 2 ] while the second 
and third terms, due to the symmetry of it fir, y), supply two equal integrals. 
Thus 


Var {nfi} ~ 2 JJ y)\p{z, u) dU n (x, z) dU n (y, u) = 


(41) 2 


JJ 'P(x,x)'l'{y,y) df n (x) dV n (y ) — 2 JJ i/(x, y)<p(y, z) dV n (y) dW n (x,z) 


+ JJ y)t(z, u) d\V n (x, z) dW n {y, u) . 

In the same way moments and variances of any order can be computed for any 
quantic / m . 


5. Final statement on the limit of expectation of quantics. We shall prove 
the following: 
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Lemma B s . Given a sequence of distributions Vi{x), V 3 (x), 7a(a;) , • • • and a 
qmntic of order 2m 

fim = ff ■ f '/'(zi, ^2 , • • ■ , x lm ) dT n {xf) dT n (x 2 ) • • • dT„(x in ) 


assume that there exist a continuous function 4>( x ) and a distribution 7(x) such that 

I *K*I > 3* , • • x sm) I ^ ’S'ixf) SE'fe) • T (x 2m ) 

(42) 

<Z7»(x) g dV(a:) for | x | > X, v = 1, 2, 3, • • 
and that the integrals 

(42') J *$ r (x) dV(x), (r = 1,2, • 2m), 


have finite values. Then, for any S > 0 

(43) lim n E n (f 2 m j ~ 0. 


This lemma, on winch the main theorem of Part II is based, will be estab¬ 
lished if it is shown that the formula (36) holds true for functions \p satisfying 
the conditions (42). 

In the transition from the complete expression (10) for the expectation E n 
to the asymptotic value (25) two essential steps were made First, certain 
products of the form (11) have been omitted and, second, certain products 
of Pff as defined m (19) have been arbitrarily added. This was allowed be¬ 
cause each of the products was seen to be smaller than 1 and their number was 
of the order 0(n' n ~ l ). If a quantic in integral form (G) is considered which 
involves an infinite number of expressions like (10), a sharper estimate is 
necessary. 

It is easily seen that each integral (110 is a polynomial in p VK including the 
product p„jp„ 2 p „3 • • and another factor which is certainly bounded whatever 
the p„ K are. Thus, if the expectation of 66 • 6m is computed, each term of the 
form (110 consists of a finite factor and the product p n p v2 ■ ■ p v , 2m . In passing 
to the expectation of the quantic, the p vlc have to be replaced by dV v (xf) and 
each neglected term in (10) leads to an expression like 


(45) 


ff ftfa, 


x 2 


■ ■ , X 2m ) dV ri {x i) dV n (xf) ■ • dV fK (xf). 


According to the assumptions of B 3 this integral has a finite value. The num¬ 
ber of neglected terms being of the order 0(n m ~ l ) the omission of these terms is 
justified. 

On the other hand, products of P'ff equal, except for the sign, products 
of p fl p VK as long as i ^ s and, except for a finite factor, products of p H as often 
as i = k. Again it is seen that the arbitrarily added terms sum up ho integrals 
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of the form (45). This shows that here too, if the conditions of B$ are fulfilled, 
the procedure leading to (25) may be applied. 

It follows that, under the conditions (42), if the integral (42') has a finite 
value, eq. (36) is correct and (43) is an immediate consequence of it. On the 
other hand, it is obvious that weaker conditions than those given in B 3 would 
suffice to establish (43). 


6. Theorem on products of n functions. The principal source of all explicit 
formulas on asymptotic distiibutions lies in certain properties of products of a 
great number of factors. Laplace devoted a part of his fundamental Treatise 
of Probability to these problems, but a complete outline of all results from a 
modern point of view is still lacking. In the third part of the present paper, a 
rather simple statement on this line will be used which may be formulated here as 
Lemma C. Let F,(zi , z 2 , • ■ , z k ), (v = 1,2, 3 , ■ ■ -),be a sequence of analytic 
functions of k complex variables and G n the 'product F\Fi • ■ F n . Suppose that 
at the point zi = z 2 = ■ • z h = 0 all F, have the value 1, vanishing first derivatives, 
and the second derivatives 


(46) 


A\? = 


d*F v 
dZi dz K ’ 


Then 

(47) 


lim 




= 0 


uniformly m each bounded region | z, | SZm which the absolute values of the third 
derivatives of all F v have an upper bound M. 

In fact, the Taylor development of F v supplies under the conditions stated; 

(48) F y (zi, Zi, • • • , z h ) = 1 + I X + 0(Z % ) 


and, therefore, 

(48') log F y (zi, z 2 , • • • , Zk) = I X AuZiZ* + 0(Z 3 ). 

I, K 

If here all z, are replaced by Zi/y/Vt and the equations added for v = 1, 2 , • • ■ ,n 
we obtain 


(49) 


log G n 


Zi Z 2 Zk \ 1 

Vn ’ Vn ’ ’ ’ Vn) ~ 2 n " 


A 


(*> 

1 K 


z,z* + nO 



and this shows that the brackets on the left-hand side of (47) are 0(Z/Vn )-— 
It is obvious that (47) would still hold if the condition concerning the third 
derivatives is replaced by a somewhat weaker one. 



DIFFERENTIABLE STATISTICAL FUNCTIONS 


323 


PART II. DIFFERENTIABLE STATISTICAL FUNCTIONS 

1, Definitions. We consider a one-dimensional cumulative distribution func¬ 
tion 7(a) as a point in the 7-space. If two points 7i(z) and 7 2 (s) are given 
the functions 

(1) 7i(a) + t[V,(x) - V l (x)], 0 A t A 1 

represent the straight segment between 7i(x) and 7 2 (a:). A subset of the 7-space 
that includes all segments determined by its elements is called a convex domain. 

Now, assume that a sequence of collectives with the distributions 7i(a), 
Vi(x), 7s(a) , • ■ • be given. We shall consider functions /{7(a)} defined in a 
convex domain that includes particularly: (1) all average distributions V n {x) 

(2) 7„(a) = - E V,(x) 

v=l 

at least from a certain n on; (2) all repartitions SJx) that can occur, i.e. the 
repartitions of n quantities that belong to the label sets of the given collectives 
(e g. positive a;, etc.). If 7°(A) and 7(a) are any two points of the domain, the 
quantity 

(3) F(t) = /{7°(a) + t[V(x) - V\x)}}, 0 A t g 1 

is a function of the real variable t It will be supposed to admit derivatives 
with respect to t up to the order r + 1. 

Following Volterra [9, 10] we define (in a slightly modified way) the derivative 
/' of a statistical function / in analogy to the set of partial derivatives of a func¬ 
tion of several variables. If 7(a) would stand for a set of distinct variables 
Vi, Vi , 7a , • • • and 7°(a) for their lmtial values Vl ,Vl , Vl , • • one would 
have 

m/f 7° (a) + t[V(x) - 7(*)]}w> = E ~ (7, - Vl) 

Ctt y Cf V y 

where 9//9 V v is the partial derivative of / with respect to 7„ taken at the point 
V y = V° v . Thus we write 

(4) |/{7°(a) + f[7(a) - 7 0 (x)]](_o = Jf{V\x), y}d(V - 7°)(y) 

and call/' 'which depends on 7°(a) and on a scalar variable y, but not on 7(a), 
the (first) derivative of/{7(a)] at the point 7°(a). Only if a relation (4) is 
fulfilled for any two points of the convex domain, / is called a (one time) differen¬ 
tiable function. 

The derivative of a linear function 

i 

A = J «(*) dV(x), B = I /3(a) d7(a), 


(5) 
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is simply the factor a(y), (3{y) ■ •• respectively, independent of the point at 
winch the derivative is taken If / is given as a function of A, B, ■ • • one has 

(6) /' 1 F(*), y] = «(»)i+ + •••■ 

The derivative of the non-linear function 

(7) /= // iK*, 2/) dF(s) dFfo) 
is 

(8) /'{F 0 (.t), 2/} = / W®, 1/) + *)] dF°(*). 

Note that an additive constant in }' (i.e. a quantity independent of y) has no 
significance since the integral of d(F — F°) vanishes It follows from (6) 
that the first derivative of the mtli order variance as defined in (2) of the Intro¬ 
duction, at the point F°(a) is 

(9) (y - o«) m -myj (x - o 0 ) m_l dV a (x) 


where a 0 is the mean value of V°(x). 

In the same, way derivatives of higher order can be introduced. The second 
derivative of /{F {x) j is a function of F°(m), i.e. of the point at which the deriva¬ 
tive is taken, and of two scalar variables y, z which correspond to the two sub¬ 
scripts in the case of a function of distinct variables. The definition of 
/"(FOr), y, z\ is given in the equation 


( 10 ) 


|^/{F 0 (®) + t[V(x) - V°(x)] ] ,„ 0 


- // /"{F°(z), y, *) d(V - F°) (y) d(V — F°)(s). 


The second derivative of a linear function is zero. The function (7) has the 
second derivative ^(z, y) + i p(y, z ) independently of V°(x). The mth order 
variance gives, twice differentiated 

(11) —2 mz(y — a 0 ) m_1 + m(m — 1 )yz J (x — a 0 ) ”*" s dV°(x). 

The variables y and z in j" or m any additive term of /" may be interchanged 
and a term depending on one of them may be added or omitted. Thus, /" 
can always be written as a symmetric function of y, z without linear terms 
Accordingly, the second derivative of (7) is also 2^(j/, z). 
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The derivative of rth order of / at the point V\x) will be defined by the 
equation 


‘rzfl V°(®) + t[V(x) - y°(T)]} ( „„ 

( 12 ) di 

= //■■• Jf r) W 0 (x),ylyi, ■■ ,y r } d(V - 7°)(?/i) • ■ d(V - F°) (y r ). 


Here, for given F°(t), / :o may be supposed to be a symmetric function of the r 
variables yi, Vi , ■ ■ • , y r . The rth derivative of the mth order variance is 


(13) 


(—1 ) r m ! 
(m — r + 1) 


ym Vr 


X 


(m - r + 1) f (x - a a ) m ~ r dV°(x) - £ 
J 1 


( Vk ~ «o) 


Vk 


m-r+iq 


In the case r = m the expression becomes independent of V°(t), viz. 


(130 


( — l) m ml 2 /i 2/2 ••• y m ( 1 - m) 


where terms depending on less than r of the variables yi, ys, ■ ■ ■, y r have been 
omitted. 

If the definitions (4), (10), (12) are confronted one can see that/" [V, y, z] 
is the first derivative of /' { V , y } etc. For proofs see [9] and [10]. 


2. Taylor development. The function F(t) defined in (3) admits the develop¬ 
ment 

(14) F( 1) - F( 0) = F'( 0) + iF"(0) + •■• +JjF w (0) + ( — F (r+1) (?) 

where tl is some quantity between zero and one. According to (3) the left-hand 
side equals the difference/! V (x) ) — /{F°(x)}. The expressions F'{ 0) ,F"{ 0) , • • ■ , 
F (r| (0) are the derivatives as defined m eqs. (4), (10), (12). In the last term 
to the right, one has to introduce the distribution 

(15) V'(x) = F°(s) + £[F(x) - F°(a;)] 

and then to take the (r -f l)st derivative of / at the point F'(x). 

For a given F°(x) each one of the terms on the right-hand side of (14) is a 
function of V (x). Except for the last one—m which t? depends in a certain way 
on F(x)—they are quantics with respect to F(x) — F°(x), of the same kind as 
those considered in Part I. (There we had S n instead of F and F„ instead 
of F°). 
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The rth term of (14) can be written as 

(16) F r = ^ JJ - ■ J ffa, x*, ■ ■ ■ , x T ) d(V - 7°) fa) • ■ • d(y - V°) fa) 

where according to (12) 

(16') ipfa , x,, ■■■ ,x r ) = / (r) {7°(a;), x 1 , x 2 , ■ ■ • , z r }. 

To find the characteristic properties of F r we compute its derivatives at a point 
FjOk). To do this we must replace in (16) the V(x) by 

7i(») + t[V(x) - Vi(x)] 

then differentiate the product 

(17) n d[(7 x - 7°)fe) + t(V - Vdfa)] 

K-l 

with respect to t, and finally set t = 0. The derivative consists of r terms 
the first of which will be 

d(V- VMUd^- 7°) fa ). 

«-2 

Due to the fact that \p may be supposed as a symmetric function, all r terms 
supply the same integral. Thus the derivative of F r with respect to t at the point 
t = 0 can be written as 

(7 4 T y ! If • • ■ / tfa, x ,, • •., Xr) d(V - 7x) fa) n d(V x - 7°) fa). 

Comparing this with the formula (4) which defines the first derivative of a 
statistical function and writing y instead of x and V(x) instead of Vi(x), we find 

F'AV(x),y} = 

(18) 

If'” f , as., - ' , av) d{V - 7°) fa) • ■ • d(V - 7°) fa). 

This is the first derivative of F r {V(x )} at the point 7 fa). It vanishes at the 
point 7 (e) = 7°(e). 

The integral in (18) has the same form as that in (14) except that its multi¬ 
plicity is (r — 1) rather than r. Thus it is immediately seen how the higher 
derivatives of F r can be found. For the second derivative F” {7 (x ), y, z] 
we have simply to replace (r — 1)1 in (18) by (r — 2) I, then xt by z and finally 
to omit in the product the differential d(7 — 7°) fa) . This procedure can be 
continued up to the derivative of order (r — 1). The rth derivative, finally, 
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will be 

(19) Fr r) {V(x), yi, y 2 , • • • , y T ) = , y 2 , , y r ) 

independent of V(x) and, according to (16'), equal to the rth derivative of 
/(7(o;)] at the point 7° (a). It is also seen that all integrals of the form(16) 
or (18) vanish if V(x) equals V°(x), The results can be summarized as follows: 
The sth term, (s = 1, 2 , • • r), of the development (14) is a function of V(x) 
for which all derivatives at the point V°(k) except that of order s vanish while 
this one equals the sth derivative of the original function f{V(x)} at 7°(a:). 
The complete analogy of (14) with the Taylor development of a function of 
distinct variables is thus evident. 

If we assume that /{7( x )} is a function whose first (r — 1) derivatives vanish 
at the point V°(x), eq (14) takes the form 

V(x) - V\x) = ^ // • • • ff <r> {V°(x), Vl , y*, ■ ■ ■ , y r \ 


■d(V- 7°)(j/j) ••• d(V~ 7°) (y r ) 

( 20 ) 

+ (TTIT! II ' - ■ Vx >»■> ■ • • - *«) 

■d(V~V°)( yi ) d(V - 7°)(y r+I ). 


By applying to this formula the lemmas A and B of Part I, we shall arrive at 
the general theorem on asymptotic distributions that is the principal goal of 
this paper. 


3. General theorem. The main result to be derived in the general theory of 
asymptotic distributions is that the so-called normal distribution represents 
the first element in an infinite sequence which includes the asymptotic dis¬ 
tributions of all differentiable statistical functions, except certain irregular 
cases. The Gauss distribution covers in fact only those functions whose Taylor 
development starts with the first (linear) term, in particular the linear statistical 
functions themselves If the first (r — 1) terms in the development vanish', 
the asymptotic distribution of type r becomes valid. 

Theorem I: Let 7i(a;), Vi(x), V 3 (x), ■ ■ • be an infinite sequence of distributions 
, and /(7(a;)} a statistical function with derivatives up to order (r+1). Denote by. 
S n (x) the repartition of the n label values in the collective with the distribution element 
dVi(x), dVi(x) •• dV n (x) and by VJx) the arithmetical mean of 7j (x), 
7 2 (a:) , ■ ■ , 7„(a:). If for large n the first (r — 1) derivatives of /{7(r)} at the 
point V n (x) vanish and the rth derivative equals \p n (yi , y *, ■■■ , y r ), then the 
distribution of 

( 21 ) , 


A n = n Tl2 [f{S„(x)\ -f[t n {x)}] 
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is asymptotically equal to the distribution of the rth order quantic 

B n = ~ f [ ■ ■ [ tnixi ,x s ,--',x r ) 

(22) H J J J 

■d(S n - fjfe) d(Sn - V n )(xd d(Sn ~ ? n )(x r ) 
under the following conditions'. 

a) The distribution of (22) has a uniformly bounded derivative for all n-, 

b) Within a convex domain m the V-space that includes all V n (x) from a certain 

n on, and all S n (x) that can occur, the (r + l)sf derivative off{V(x )} is smaller 
in absolute value than a product • • A'(y r+ f) whereby the 

integrals J [¥(*)]*' dV v (x ) for h = 1, 2, • • • , 2(r + 1) have a finite upper 

bound for v = 1, 2, 3 , • • • . 

In order to prove this ive introduce in eq. (20) B n (x) for V{x) and y„(a;) for 
y n (a;), and multiply both sides by ri rn Using the notations (21) and (2) and 
writing T n for (S n — Vf], the equation reads 

A n - B n = 

(32) „r/j rr r 

^qriyj J] J f lTH) { V'(x ), yi , &, • • • , y r+1 } dUyi) • ■ • dT n (y r+1 ). 

Aceoiding to Lemma A the theorem will be verified if we can show that the 
expectation of the absolute value of the right-hand expression in (23) tends to 
zero. 

According to the Schwarz inequality one has, for any real C: 

(24) E n {\C\} ^ VKW). 

For fixed values of V n and S„ the integral on the right-hand side of (23) is a 
quantic of order (r + 1) with the coefficients \p r +i(yi , Vi, • ■ • , 2 /h i) • The 
square of this integral is a quantic of order 2(r + 1 ) whose coefficients are a finite 
number (depending only on r) of terms each of which is a product of two <pr+i- 
values implying 2 (r + 1) variables yi, yi, ■ ■ ■ , y^r+i) The absolute value of 
these coefficients is, therefore, according to the condition b) smaller than a 
finite factor times the pioduct T'(j/i) ’i'iyf) ■ ■ • F^r+i)) and thus fulfills the 
condition of lemma B 3 . If the right-hand side of (23) is identified with C, the 
expectation of C 2 is, except for a finite factor, the product of ri times the expectation 
of the above-mentioned quantic of order 2(r +1). It then follows from lemma 
Ba that the linnt of F n {C 2 | is zero and from (24): 


lmi E n {\C n \\ = hm E n {\A n - B n |} = 0. 


This accomplishes the proof of Theorem I. 

If we apply here what was shown in Part I about the asymptotic distribution 
of a quantic, we can also state the following. 
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Theorem II : Under the conditions of Theorem I, the asymptotic distribution of a 
differentiable statistical function f[S n (x) j is essentially determined by 

a) the average distribution V n (x ); 

b) the first non-vanishing derivative of /{F(a;)) at the point Vffx ); 

c) the average distribution excess 

U n (x, y) = Vffx) - - E Vfix)V,(y), 
n v=i 

(25) 

= vm - - E v,(x)vm, 

n „=i 

By “essentially determined” is meant determined except for an additional 
function whose moments of any order are zero. The statement then follows 
from Theorem I in connection with the fact that the asymptotic moments of 
qualities have been computed in Part I from the values of U n (x, y). 

That functions with all moments vanishing exist has been known for a long 
time. A simple example given by Shohat and Tamarkin [6] is the following. 
Let k be a positive constant smaller than and u = x\ k = tan kt. Then, 
the density (positive or negative) 

(26) <p{x ) = e~ u sin (lm) = Ini e~ uil ~ k,) 

fulfills the condition. In fact, the nth moment of (26) is the (vanishing) imagi¬ 
nary part of the integral 

(27) - f du = (cos kit) (r>+1/ * ) r . 

Since <p(x) takes negative values of the amount e~ u it can be superimposed to a 
given distribution density only in cases where the original density remains 
greater than some multiple of e _u = exp (—x). It can be shown that the moment 
problem is determinate (i.e. the distribution determined by the moments in a 
unique way) if the density vanishes at infinity at a sufficiently strong degree. 

From the standpoint of statistical theory two distributions with the same 
moments throughout may be considered as equivalent. This justifies the ter¬ 
minology used in Theorem II. On the other hand. Theorem I is independent of 
this restriction: The asymptotic distribution of the statistical function /($ n (®) j 
is under the given conditions identical with that of the corresponding quantic 
of mth order. A detailed discussion of the case m = 2 will be given in Part III. 
Here follow some illustrations for the general case. 

4. Illustrations. The existence of asymptotic distributions of higher types 
can be exemplified in a comparatively simple way if we start from any known 
asymptotic distribution of a statistical function. 

Let us assume that g ( V(x) } is a function fulfilling the condition 

(28) g{V n (x)} = 0 


x A y 
x A y. 
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for all n, and that the asymptotic c.d.f. for £ 7 {<S„(a;)} is known. There will be 
some positive integer r such that 

(29) Prob [g { S n (x )} S zn~ T,i ] ~ $„(z) . 

If, for instance, g is a linear statistical function r will be 1 and, under well- 
known conditions, $ n (x) a normal (Gaussian) c.d.f. with finite variance depend¬ 
ing on n. 

Now, let / be an ordinary function of g and thus another statistical function 
which may be denoted by f{V(x)}. According to the rules of differentiation 
we have 

(30) f{V(x),y} =fg'{V(x),y] 

and analogous relations can be derived for the derivatives of higher order. In 
particular, the following statement, valid in ordinary differential calculus, holds 
true: If g{P(a:)] has derivatives of every order and if the first s derivatives of / 
with respect to g vanish at some point g = gf [ T^i('c)} then also the s first deriva¬ 
tives of / with respect to V(x) will be zero at V(x) = Pi (a:). In this way we can 
devise statistical functions, with vanishing derivatives, for which the asymptotic 
distribution is known. 

For the sake of simplicity we may assume that (29) holds with r = 1 and 
that f(g) is a monotome increasing function, given in the form 

(31) fig) = g’l 1 + aCfir)] 

with s a positive integer, and the inverse function 
(3P) gif) = fll + 0(f)] 

where /3(f) goes to zero with / —* 0. Then, from (29): 

(32) Prob [/{&(*)) g zn~ um ) ~ $ n (z0 
if z and z are connected by 

»V = g(n i,m z) = TfV'Il + j3(rr (,/2) z)]. 

It follows that 

z' - z v ‘ ~ 0 


and if $„(«') is supposed to be continuous, (32) becomes 

(33) Prob [/{&(*)) g zrT W2) ] ~ $„(z 1/s ). 

This is a distribution of type s. 

Take as an example for g the arithmetical mean 


0iS„(a;)} 


£i + £2 + • • 


+ X n 


Of/i 


(34) 


n 
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where xi , x 2 , • ■ • , x n are the observed values and a» is the arithmetical mean 
of the mean values of V r (x). Then, under certain restrictions for the VJ,x)> 
there exists a bounded sequence hi so that 

Prob[Vn 0 ^ z) ~ $n(z) = ■%= f du. 

V 7T J-» 

Now if we choose 

f = 6 (g -sin g) = g* ^1 - L -f .. 
the asymptotic distribution of f will be given by 

Prob [n\/nf g z] ~‘fv(v / s) = ~ 7 = J du 

with the probability density 

z ~ m) 

3v 7T 

Similar examples can be drawn from the asymptotic distribution of n% 2 if one 
asks for the distribution of appropriate functions of n\, etc. 

PART III. SECOND-TYPE ASYMPTOTIC DISTRIBUTION 

1. Statement of the problem. We now propose to study the asymptotic 
distribution of a quantic of second order as defined in eq. (6) of Part I. It 
has been shown in Part II that tins covers the case of any statistical function 
of which the first but not the second derivative at the critical point vanishes. 

Independently of what was said before, the problem can be stated in the fol¬ 
lowing way. Given a function y ) and a sequence of cumulative distribu¬ 

tion functions Pi (a), p!(a;), Ps(x) • • •. Let y’„(a;) be the arithmetical mean of 
Vi(x), Vi(x) , ■ ■ • , V n (x) and S n (x) the repartition of a sample , • - ■ , z n 
drawn from the collective with the distribution element dVi{zi) dV 2 (z 2 ) , • • • , 
dV n {z n ), that is: nS n {%) is the number of those of the observed values 
Zi , Zi , ■ • ■ , z n that are smaller than or equal to x. Then the quantity 

(1) / = J f y) dT n (x ) dT n (ij), where T n (x) = S n (x) - V n (x) 

is determined by the observations Zi, z 2 , ■ ■ • , z„. We ask for the distribution 
of / at large values of n. 

Without loss of generality, the function x, y) can be supposed to be sym¬ 
metrical. If, in particular, >p(x, y) = f{x)^/{y), the quantity / becomes the 
square of 

J t(x) dTjx) = ~ 2 - f ^( z ) 


( 2 ) 



332 


B. V. MISES 


and its asymptotic distribution can be computed in the manner shown in the last 
section of Part I. Another example would be 

V ) = 0(®) (a; ^ v) 

(3) “ yi 

= g{y) (x £ y). 

In this case, integration by parts shows that 

(4) /(&.0)} = J g'(x)T 2 n {x) dx 


where g' is the derivative of g. This is the statistical function that takes the 
place of x m continuous problems, See Introduction eq. (3). 

Note that the “excess” T n (x) vanishes at x — ± » and that for sufficiently 
large x the increment dT n (x) equals —dV n (%). Thus, conditions for the exist¬ 
ence of the integrals in (1), (2), (4), etc. can be expressed in terms of the given 
functions \p(x, y) and V,(x). 

We shall first study the special case that implies so-called discontinuous chance 
variables. In our terminology it is the function \p(x, y ) that has to be specified. 
Pet h,L , ••• , h be k mutually exclusive one-dimensional intervals (or groups 
of intervals) and I K+ 1 their complement Assume that p{x, y) has a constant 
value when x falls in 7, and y falls in /„ (t, k — 1,2, • • ,7 + 1). The increments 
of S n (x), V n (x), T n (x) in the interval 7* will be called p*, p„ , £„ respectively. 
Clearly, np, is the number of observed values falling in 7,, np, is the expected 
number of such values, and n(p« — p K ) — the excess of observed over expected 
numbers. Note that the given distributions V,{x) determine increments p„, 
in the interval 7, and that 


(5) 


Pk -(Pin + P2< + ’ ' ■ + Pn«)’ 

n 


Since the sum of all ?« must be zero we can replace h+i b y 

(6) h+i — — £i — & — • • ~h 

Thus, the integral (1) can now be written as a sum of k 2 terms 

(7) f\S n (x)} = Z k KU. 

Lift 

like that introduced in the second eq. (8) of Part I. 

Our next task will be to find the asymptotic distribution of (7) which depends 
on the matrix < p tll , (i, k = 1,2, • • • , k), and on the succession of probability 
values p n , (v = 1, 2, 3 , • ■ ■ , k = 1, 2 , - k). The matrix \j/ lK in k variables 
will be supposed to be symmetrical. 


2. Characteristic function. We define our chance variable as 
(8) x=\f 
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All summations, here and in what follows, are to be extended from 1 to k if 
not otherwise indicated If P n (x) is the c d.f. of a, that is 

(9) Prob j|/ g xj = P„0) 
the characteristic function (c f) is defined by 

(10) QM = = f e xm dP„(x) 

In order to compute Q n we assume that the quadratic form (8) is transformed,, 
by a linear transformation, into a sum of squares Using appropriate (in general 
complex) coefficients a,« one can write 

( 11 ) x = - (y\ + 178 + • • • + fit), = Z • 

K 

(The form \p LK is here supposed to be non-singular winch, however, means no 
loss of generality), It will be seen later that explicit knowledge of the a lK 
is not needed. 

Now, for any real or complex y, the identity holds 1 

(ia «'*■ - ^ a. 

If we write v for s/ui and replace in (12) successively y by w V n m , « V ny 2, 
we find 

( 13 ) e xul = {2 t)~ w JJ ■ • J exp [-§Z^« + WnTjZ&*\ dkdi 2 ■■■ dtk 

where 

(14) ^ %k = ^) y k h 7 2 K ~ Ct lK t t , 1 1 2, ■ ■' , fc) . 

Since the first exponential factor m the integrand is a constant with respect 
to the chance variable, the expected value of e x,a is given by 

(15) Q n (u) = E{e xm ] = (2 v )- m JJ ■■■ I exp [-^Z £} G n dt 1 dt* • ■ • dt k 
with 

(16) G n = E{ exp [dV n Z zdhlj • 

In order to find G n we consider the following n collectives C\ , C 2 , • • • , C n 
with discontinuous, (7c + 1)-valued distributions: In C v the label values are 
21 , z 2 , , z k , and z k + 1 , with z k +1 = 0, their probabilities p,, 1 , p, 2 , ■ ■ • , pv,k+i . 

The c.f. of this distribution at the point —*«/ y/ n is 

1+1 

Z 


( 17 ) 
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If we multiply the n expressions (17) for v = 1,2 , ■ ■ ■ n the product will be— 
according to well-known rules of probability calculus—the c.f. for the distribu¬ 
tion of the sum of the n label components in the collective formed by combining 
Ci, Ci, , C n . This sum is 

E^P/Z. 


and therefore, 

(18) E jexp E npA Jj = II j^E P™ J. 

Multiplying both sides of this equation by 

(19) exp [" Vn^ Ui> ^\ = eXp [~ 5 

and using the abbreviation 

(20) z„ = E ' 


we arrive at 

(21) Gn = E{ exp [v\/n E £«*«]} = E 1 F 2 • • • E„ 
with 

4+1 

( 22 ) F, = 

This solves the problem: By inserting (21), (22) m (15) and carrying out the 
integration with respect to ti, k , ■ • • , h one has expressed Q„(u) in terms 
of the given p„ and of the coefficients «„ which link the z k to the t K . This ex¬ 
pression for Q n (u) holds for all n. 

We have still to show that the integral (15) exists, at least for small | u | or 
] v |, independently of the value of n. For this purpose we develop F, , as given 
in (22), in the neighborhood of v = 0. At this point F r = 1 and the first deriva¬ 
tive vanishes by virtue of (20). We thus have 

2 4+1 

(23) F, = 1 + ~ E Pv,(*« - Zy? 

2 n *_i 

with | d, | £ 1, From the definition of z„ in (14) it follows that the ratio | z t \/T 
with 

T 2 = A + <2 + • ■ • + tl 

has an upper bound depending on the a u only. On the other hand, according 
to (20), z, is a weighted mean of the z K and, therefore, | z„ — | will not surpass 

twice the maximum | z, |: 

(25) 


| z« — z, | < aT 
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where a is & positive function of the coefficients a lK which, in turn, are deter¬ 
mined by the \f / iK . Introducing (25) in (23) we find 

I F v | < 1 _(_ I" 1 a T 2 g W ‘■r/v'« ^ G hH*= T 2i n 
2 n ~ 

and, finally, from (21): 

(26) | (?„| < e 1 ” 1202 ^ = e |u| “ 2T2 . 

Thus it is seen that for 

(27) | u | < ^ 2 or 1 — 2a 2 1 u | ^ tj 2 > 0 
the integral (15) admits the upper bound 

(28) | QM | < (2 jf ■■■ I dk , dk , dk = if* 

It also follows that the contribution to QJu) from the region T > T 0 tends to 
zero with increasing T a , uniformly with respect to n and with respect to u in 
the region | u | < 1/2 a. 


3. Asymptotic value of Q n (u). If the quantity F„ introduced in (22) is con¬ 
sidered as a function of Zi/V n, Zt/y/n, • • • , Zi'/s/n , we may write 

Ic+1 

(29) F,(z 1 , 2 », • • • ,**) = 

Here, z, is defined by (20) and, on the right-hand side, Zk+i is zero. These func¬ 
tions F„(zi, z 2 , ■ , Zk) for v = 1, 2, 3 , • ■ ■ have all the properties required 
in Lemma C of Part I: At the point zi = Zz = • • • = z k = 0 one has F v = 1, 
the first derivatives are 

dF\ 

— = vp vi - vp yi 2^ = 0 

02l K_1 


and the second derivatives, (i ^ k), 


(30) 



= fp^l - pj 

“ V Pfll Pvk ■ 


The third derivatives are certainly bounded in any finite region of the 2 -space, 
and this means also in any finite region of the f-space. 

The matrix of the second derivatives except for the factor v~ is exactly that 
defined in eq. (19) of Part I: 

(31) Fik “ Pm^iM PviPyu 
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and the arithmetical means of the derivatives from the matrix in eq. (23) of 
Part I: 

~ 1 71 1 71 

(31 ) P it — ^ ) Pvi &ik ' ^ ) Pn Pvk • 

n «=i n 

Applying Lemma C we find 

(32) G n = Gn (^r n> ~exp[- 

This is valid in any finite f-region Since it has been shown at the end of the 
foregoing section that, for small | v [, the outside contribution to the integral 
(15) converges uniformly (for all n) towards zero, we are allowed to introduce 

(32) in (15). Writing 

(33) Y = Y 7 i, whereby = Y 

l,K it K X,M 

equation (15) becomes 

(34) Q n ( u ) ~ (2 iry u ~ jj • • J exp £ — f Y & + §*« Y 7 « dh d{ 2 dt k . 

Now, it is well known that if m iK is any positive definite matrix with the de¬ 
terminant | m„ |, then 

(35) (2t t)~ w JJ • ■ J exp [- J Y Q dii dk ■ ■ ■ dt h = . 

This is likewise true if the matrix m IK , which we also call M, has the form M = 
Mi — \M 2 where Mi is positive definite, ilf 2 arbitrary (complex) and | X | suffi¬ 
ciently small. Thus, the integration formula (35) applies to (34) and the result 
is reached, for small | u |: 

(36) Q n (u ) ~ Q(u) = ^ 7 =^ with D(\) = | <$« — \y u |. 

If the a,« wliich transform the given quadric into a sum of squares are known, 

(36) with (33) supply the solution of our problem. 

The formula (36) is susceptible of several useful transformations. Let us 
write A for the matrix a tK , A' for the transposed matrix, and T, P, T, I respec¬ 
tively for the matrices \p tK , P, K , q,,, d„. Then, obviously 

(37) T = A'A, T = APA', M = I — uiT. 


If we multiply M by A' to the left and by A to the right, we obtain 

(38) A'MA = A'lA — m A'APA'A = T — m TFT. 
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In this operation the determinant of M is multiplied by | 4 lK \. Thus D(X) 
can be written as 

(39) D(X) = with yu = 

Here, the knowledge of the is no longer required, 

If the matrix (38) is multiplied twice by 4>*, the inverse of 'I', we find St* — mP 
and, therefore, 

(40) D(\) = |lM X U* -XP lK |. 

As P is positive definite and St* real, it follows that all roots of D(X)— 
the “Eigenwerte” of r—are real numbers. Therefore, D~ m {ui) is a regular 
function along the real axis in the w-plane Thus, (36) which was proved so 
far for small | u | only remains valid for all real values of u : The c.f. of the 
asymptotic distribution is represented by for all real ii-values. 

Multiplying (38) only once by St* we obtain one of the two forms 

(41) I — ui 41? or I — ui PSh 
which lead to 

(42) 7)(X) = | dm Xs lK j = | 5m Xs Kt | , Sm — 4*itiPn<» 

Although this formula has been derived by means of St* it can be seen by con¬ 
tinuity considerations that it remains valid whatever the (symmetric) matrix 
is The formula makes it clear that the asymptotic distribution of the 
quadric is completely determined by the “Eigemverte” of the matrix 

8 = SkP. This bears out our second mam theorem in Chapter II, as far as 
quartics of the form (8) are concerned. It will be seen m sec. 5 how (42) applies 
to the continuous case. 

We, finally, apply to (36) a transformation that is valid only if P has an inverse 
matrix P*. (As shown in Part I, sec. 4 this is not the case if the k intervals to 
which the subscripts 1, 2, • • , fc refer cover the whole range of the variables 
xi , xi , • ■ • , x n ). Multiplying (41) by P* we find the matrix P* — and 
thus 

(43) D(X) = | | X | Fa - X*„ |. 

This is equivalent to 

(44) Q(u) = | P lK | 1/2 J/ ■ ■ I exp [- §Z?* 

+ d£]d|2 • • • . 

According to the definition of the characteristic function eq. (44) can be inter¬ 
preted as stating that 

(45) 


\Pm\ h exp 
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IS the asymptotic probability density for the simultaneous occurrence of t 
f 2 .’ ' Tlle expression (45) can be arrived at by applying the Cental 

Limit Theorem to the case of k independent chance variables Since how™ 

f d ° es f ot exist m g^eral, eq. (44) would not be a suitable point of’dlli ’ 
for developing the theory that concerns us here. P de P ar Lire 

4. Asymptotic value of P n (x), illustrations. The relationship between th, 
ei and the c.d.f of a distribution is well known and need not be disced w 

First the +■ +iT’ m thl ft S sectlon > two as P ects of this relationship only 
FnsL the continuity theorem, first proved by G. P61ya [5], stating that rfS 

ci. Q (u) tend towards a limiting function Q(«), the corresponding cd ? ! 

tend towards the P(x) that corresponds to Q( u ). Second, the alditmty f 

ii Q(u) is of the form a Q'(u) + jBQ"(«) with « + p = i then PM t 

a X \. d_ . ^ Wlttl the p '( x )i P"(x) corresponding to Q'(u) and 0"(n) 

27^ Sr ° UPS ° f 6XampleS WU1 iUuStrate the ^ 

a) Let us first consider a function of two excess values fr , f 2 only 


n 


(46) • * = \ f = \ (Ml + 2B& fe + Cfi) 

rnttefc""” ^ “ “ V “ ^ *" “ A ’ *" = *» " B - *« - C ■ The prdu. 

AF ii + BP n 
fi- BP n 

and the determinant of I — xThk 

(48) D(X) = 1 - \[AP n + 2 BP U + CP 22 ] + \\AC - B ! )(P U P 22 - p‘ 2 )_ 
d t wili a be the tW ° Ieal r °° tS ° f ' D(X) = °’ thS asym P totlc Probability density 


BP n + CP n 
BP21 -f- CP22 


(49) 


dP(x) _ 1 
dx 2ir 



e~ wx du 

V( x 

-m 


wi ? 0 ar ® P artl oularly interested in the case that P is "complete," i e. a matrix 
with all horizontal and vertical sums vanishing. Then P„ = - P - ~ 

r oel3 - *• 

l/[A - ** + C)p iPt . Here, instead of (49) we have 


(50) 


dPjx) _ 1 J 


e~ u ' x du 


dx 


2tt 


V 1 


m 

Xi 


f r Vi 


This is, with respect to V\z\ a Gauss distribution with the variance 

| A - w + c | p^/2. 
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If, m addition to the assumption that P is "complete” (i.e. m the present case 
that p y i + = 1 for all v ) the further assumption is made that the two inter¬ 

vals h and h cover the whole range of the original chance variables a*, x 2 , 
Xi, • ■ • , one would have also £1 + £* = 0 and from (46) 

z = n - (A - 2B + C)i!. 

In this case, \/| x | is a linear statistical function and the Central Limit Theorem 
leads to the same result as that expressed in (50). It is seen, however, from our 
derivation, that (50) holds under wider conditions: If p„i + p „2 = 1 for all v, 

there may exist another interval I s within the range of the chance variables 

, xi, • • ■ so that £i + | 2 is not necessarily zero. 

The latter remark suggests the following general theorem: If / is a function 
of the k variables £i, & > • ■ ■ , I* and g another such function but vanishing when 
• + £*, = 0, then / and / + g have the same asymptotic distribution 
provided that for each v the sum p„i + p»s + ■ • • + = 1. In the case of 

quadrics this result is equivalent to the following matrix theorem: If P, A, A 

are symmetric matrices, P with all horizontal and vertical sums equal to zero, 
SI arbitrary, and A of the form a lK = a, + a K then the two products 

(51) ?A and 7(A + A) 

have the same characteristic roots.—This can be proved by the usual methods 
of matrix calculus. The matrix PA has all characteristic roots equal to zero. 2 

b) In the definition of Karl Pearson‘s test function which is usually called 
X 2 , it is presumed that a sample is drawn from the combination of n equal dis¬ 
tributions. In this case all P M are equal and coincide with P which then can 
simply be written P : 


(52) Pm — PiSik PSP* • 

The chance variable we now consider will be 


(53) 


■-S'-5?*-“ 


n 

2 




1 2 
-A-w 


Thus = 5,,/p, and the elements of PA are 


(530 (PA) „ = X Pw.'/W = «« ~ P. • 

The matrix I — APT has the elements 


6 ll( (l — X) + Ap t . 

If the fcth’column is subtracted from any one of the others, only two terms re¬ 
main, one equal to 1 — A and one equal —(1 — A) in the last row. Thus, the 


2 A proof of the matrix theorem has meanwhile been published by Alfred Brauer, Bull, 
Amer Math Soc , Vol 53 (1947), pp. 605-607 
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determinant Z)(A) includes (A — 1) times the factor (1 — A), On the other hand 
D(A) is of degree (A — 1) and has the absolute term 1. 'therefore 

(54) D( A) = (1 - A)*- 1 

This supplies the x 2 -distribution with (k — 1) “degrees of freedom” 


(55) 


Q(u) = (1 — w) 2 



(s is 0). 


Again, our result is slightly more general than that reached m the usual theory, 
It includes the case that in addition to the A intervals with the probabilities 
Pi, pi > • , Vk (whose sum is 1) there are other intervals with probability zero. 

On the other hand, if to x 2 a term of the form n2(o, + a*)£i£ K is added,|this 
would not change the asymptotic distribution. 

One may ask for other quadratic functions of £i, £>, • ■ • , whose asymptotic 
distribution is given by (55). In particular, one might be interested in a generali¬ 
zation of x for the case of unequal original distributions The answer can easily 
be given by introducing the cofactors of order (A — 1) and of order (A — 2) of the 
determinant | P lt | . It was mentioned in sec. 4 of Part I that all cofactors of 
order (A — 1)—in the case of "complete” P—have the same value It may be 
denoted by A. The cofactor corresponding to the lines i, k and the columns 
A, >x will be denoted by with II = 0 if i = k or A = /x, Then, if l is any]one 
of the integrers 1, 2, ■ • ■ , k 

(56) ^ik ~ ^ j k -4 -1 


is one possible solution. In fact, the product PT has in this case the elements 
(PT),, = 5.«, for t, k -f- l 

(57) = — 1 i = l, k 3=1 

= 0 , “ k = l 

The determinant of I — APT is then seen to equal (1 — A) t_1 . 

The solution (56), however, is unsymmetncal in the sense that it does not 
include any terms with £ ; A completely symmetrical solution in which all 
£ play the same role is given by 

1 * 

(58) = M ^ 

According to (57) the matrix PT now consists of terms ( k — 1 )/k in the prin¬ 
cipal diagonal and — 1/A at all other places, that is 

1' 
k' 



(580 


(P*)., = 
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In the same way as in the case of (53') it can be seen that the determinant of 
I — equals here (1 — \) 1 1 The asymptotic distribution of with 

the coefficients (58) is, therefore, the % -distribution unth (fc — 1) degrees of freedom. 

If the formula (58) is applied to the case of equal P w the corresponding 
quadric becomes 





that is, x + a term vanishing with £i -f £> + ■ + h- One can easily modify 
(58) so that it leads to x without any addition. 

c) A third group of examples where the asymptotic density is expressed by 
simple functions is that where D(\) is an exact square, that is, all characteristic 
roots (except the one that is zero) have even multiplicities Let us assume k = 
2m + 1 and let \i , X 5 , ■ , be m double roots. Then 


(59) 

with 

(59) 



and therefore 


(60) 


dP(x) = 




dx M =i 

Assume, for instance, that all original distributions are uniform, that is 

pW _ p _ 1 s _ ! 

1 •« _ £ 0,l! p 

and that the quadric / is given in the form (11) with the following a it : 

= Vkci for i = 1 

= Vfrci ” i > 1, k = 1, 2, • ■ • , i - 1 

= — (l — 1 )VfrCt ” l > 1, K = l 

= 0 ” i > 1, k = i + 1, i + 2, ■ • • , k. 

Then, the y u as defined in (33) become 

7 „ = c,i(i — l)fi« for i or k > 1 

= 0 

and D(\) according to (36) takes the form 

i 

(63) D(\) = | | = II [1 - ~ 1)1- 

i=2 


x ^ 0. 


( 61 ) 


( 62 ) 
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In other terms, for the quadric 

f = + • • • + ^) 2 4" & c 2(£l — Is) 2 + + £2 — 2£ 3 ) 2 + • ■ • 

+ kcic[^i + £2 + •' ■ + £k-i — (k •— 1)£ ( J 2 

the characteristic X-values are 1 /c,i(t — 1). 

Now, to obtain the case of m double roots with k = 2m + 1 we have simply 
to choose 


C 3 — 3cj, Sc 4 — 1 5cg, 5 cq — 7 c 7 , * *- • 

The first term on the right-hand side can be entirely omitted in accordance to 
what was said in connection with (51). Besides, for the same reason, the ex¬ 
pression can be simplified in various ways by assuming £1 + | 2 + • ■ ■ + £ k = 0 . 
As a numerical example, take k = 5, ct = 3, c s = l,c = 5, c 6 = 3. Then 

/ = 20(£i + £2 + £a + 20 £4 + 20 £5 — £i£a — £s£3 — £j£i + 10 fags ) 

leads to the characteristic values X = 1/6 and 1/60 and the asymptotic density 
becomes 


dP 1 / —z/89 ~xl&\ 

dx 54' 

In a similar way other groups of quadrics with asymptotic distributions of 
the type (60) can easily be constructed. One may, for instance, use eq. (41) 
and make vamsh, in the matrix £ = P'T, all elements on one side of the diagonal 
so that the roots are immediately known. 


6 . Transition to the continuous case. In this concluding section, the transi¬ 
tion to the case of a quadric of the form ( 1 ) with continuous (x , y ) will be 
outlined. The formula best fit for this purpose is eq. (36). We therefore 
suppose the statistical function / given as * 

(64) / = J J f(x, y) dT n (x) dT n (y) with i{x, y) = j a(r, x)a(r, y)dr. 

In analogy to (33) we derive 


7(», y) = J J ol(x, s)a(y, l) d(J n (s, t) 

(65) 

= J a(x, s)a(y, s ) dV n (s) - J J a(x, s)a(y, t) dW n (s, t). 

Since dW is symmetric, this function y{x, y) is symmetric with respect to x and 
y, If D(X) denotes the Fredholm determinant of the “kernel” 7 (.x, y), we con- 
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elude from (36) that the characteristic function of the asymptotic distribution 
of / will be given by 


( 66 ) 


Qn(u) 


1 

D(ui) 


if certain convergence conditions are satisfied 

In order to establish (66) the mam point is to find a sequence of functions 
y), y), ■ • each of the type considered m the foregoing Sections and 

such that 1) the distribution of the quadric /*. with the coefficients \pi tends to¬ 
wards the distribution of / with increasing k and independently of n, and 2) that 
the determinants Di corresponding to \p k converge towards D as A, increases in¬ 
definitely Using our Lemma A we can replace the first condition by asking 
that the expectation of | / — /*■ | should go to zero with 7c —> o>independently of n 
The following assumptions shall be made concerning / and the V p (x) The 
function a(r, x ) in (64) is continuous and bounded m every finite region, there 
exist two positive continuous functions a(r), fl(x) such that 

(67) | a(r, x) | g a(r)/3(x) 


and that the integrals 


(68) J air) dr = M, j p{x) dV,( x), J fi 2 (x) dV r {x) 


exist, the latter two being bounded and converging uniformly with respect to 
v. We are going to devise a step function \pk(%, y) so that for the corresponding 
f k and any positive «i 

(69) E { | / — fk | } g «i. 

Let N be an upper bound of the integrals 

(70) J P(cc) dVy(x) giV, J P(x) dV n (x) g N 
and e = ei/(5 + 8 TV). Choose a value L such that 

(71) f P(x)dV„(x) f p 2 (x) dVXx) g ± 

and, calling B the maximum of /3(x) in j * | g L, another quantity if such that 



We subdivide, in the z-y-r-space, the domain | .r | g L, | y | g L, | r | g if in 
]c equal cells where k is determined by the condition that the absolute value 
of the variation of air, x)a(r, y ) within each cell does not exceed «/4if. Outside 
this domain we set yh(r, m) = 0 while inside the domain a h {r, x)a k {r, y) shall 
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equal the value that a(r, x)a(r, y ) assumes in the center of the respective cell. 
Then y) will be defined by 

(73) X, y) = J a t (r, x)a k {r , y) dr. 

From the definition of fc and from (67) and (72) it follows that 

| \p(x, y ) - \p k (x, y )| ^ / | a(r, x)a(r, y) - a k {r, x)a k (r, y)\ dr 

(74; + [ \a(r, x)a(r,y)\dr 

S 2 RS- + p(x)0(y) f a 2 (r ) dr g i + £ 2 * = * 

| r | > R & 3 “ 

as long as | x | g L, | y | ^ L. If this square is called (L) and the comple¬ 
mentary region ( L ) we have 


(75) 


f - fa = [ [ [lK*, 2/) - lfo(*, 2/)J dT n (:c) rfT„(i/) 

J J(« 


+ 


[ [ \p(x, y) dT n (x ) dT n {y) 

J J(i) 


and since the integral of | d.T n (x) dT n {y) | is not larger than 4, while, according 
to (64) and (67) 

(76) 1 iK x , y ) | ^ P(x)f3(y) J a\r) dr = Mp{x)p{y) 
we conclude from (74) and (75) 

(77) |/ - fa\ ^ 4« + M [ [ p{x)p(y) | dT n {x) dT n {y) | . 

J J(i) 

This gives 

(78) E{\f -fa\} S 4e + M [ f p(x)p(y)E{\dT n (x) dT n {y) |}. 

J J(Z) 

Now, from | dT n | = | dS n — dV„ | ^ dT n + 2 dV n and from the formulas 
derived in Part II, 


E{ dT n (x )} = 0, E{ dTn{x) dT n (y )} = - dU n (x, y ) 

Tb 


it follows 


F{ |'dr,(a) dr n (i/) I } ^ - dU n (x, y)+ 4 d7 n (x) d7„(i/) 


(79) 



DIFFERENTIABLE STATISTICAL FUNCTIONS 


345 


with 

(79') dU n (x, y ) = S(x, y) dV n (x) — dW n (x, y) g S(x, y ) clt n (x). 

If this is introduced in (78) and (71) taken into account, wc find 


E{ |/ - A | ) g 4e + M - f (i\x) dV n (x) 

n j |2[> t 

(80) + 4 M f j I3(x)p(y) dV n (x) dVJy) 


g 4e + - e + 4 X 2Ne g (5 + W)e = 4 


n 


as required m (69). 

On the other hand, it can be seen that the kernel y{x, y) as defined in (65) 
is the limit of the sequence y k (x, y) 


(81) 


y k (x, y)= a k (x, s)a k (y, t) dU n (s, l) 
J J( L ) 


= 0 


for x, y m ( R ) 
for x, y in (R) 


where (R) means the region | x | g R, | y | g R and (R) the complementary- 
region. In fact, from the definition of k and eqs. (67) and (71) one has for x, y 
in (R). 


| y{x, y) - 7 k(x, y)\ g ^ J 1 dU„(s, t) 


+ [ [ | ot(x, s)a(y, t ) dU„{s, t) 

J J(L ) 


(82) 


g 4 b + «(*)«(y) f I 

2 Li L J | ■ i > £ 


0 2 (s) dV n (s) 


+ - £ j8(s)|3(<) dV,(s) dV,(t) 

n *=i J 


= i + a(x)a(?/) i (1 + 2N) - 

Since a(x) is bounded, the right-hand side goes to zero with e. Finally, for 
x, y in (R) we have 


y(x, y ) - 7 k{x, y) | g J J I s)a{y, t) dU n {s, l) 

g a{x)a{y) [ J f) 2 {s) cLVn(s) 


+ 


- £ f f dFv(s) dV,(0- 

n fc i J J 


(83) 
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Here, the two terms in the brackets are bounded, but a(x)a(y) goes to zero as R 
increases The conclusion is that y/.(x, y) tends uniformly towards y(x, y) 
with k —> °o. 

Thus, eq. ( 66 ) is established provided that the function y(x, y) defined m (65) 
has a Fredholm determinant D(\) that is the limit of the corresponding alge¬ 
braic determinants and provided that the c.f -\/\/D{ui ) leads to a c.d.f. with 
bounded derivative. 

As an example let us consider the case 

a{r, x ) = s/g'{r) for r ^ x 
(84) 

= 0 “ r < x. 


This function is not continuous as it was assumed in establishing ( 66 ). How¬ 
ever, the existence of a single discontinuity line, x = r, does not invalidate the 
argument. We assume g'{r) = 0 and equal to dg/dr Then, in the case of 
(84): 


(85) 


i p{x, y) = J a(r, x)a(r, y)dr = - g(y) for x g y 

= - g(x) 


x a 


Since, however, adding to ^ a function of x or of y alone does not change the 
value of /, we can also use 


(85') 


^{x, y) = g(x) for x £ y 

= g(y) “ x A y. 


The statistical function / that corresponds to (84) can be computed either from 
(85) or (850—or directly from (84) if we use the formula that follows from (64) 


dr. 


(86) J = j J «(r, x) dT n (x) 

The integral in the brackets is, in our case, seen to equal y/g'{r) T n (r), thus 

(860 / = J g , (r)[Sn(r) - V n (r)fdr. 

This is exactly the test function u 2 mentioned m the Introduction, eq. (3). 

To find the distribution of / we have to compute y(x, y ) Its definition (65) 
can be written in the form 

(87) y(x, 2 /)=i-S Ja(x,s)a(y,s) dVr(s) - f a(x, s) dV r (s) [ot(y,s)dV„(s) 

This supplies in the case of (84) 

= Vg'(x)g'(y)[t n (x) - V n {x)V n {y)] for x ^ y 

( 88 ) • _ _ 

= Vg'{x)g\y)[V n (.y) - V n {x)V n {y)\ “ % ^ V- 
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Here, the second term, m the brackets is the arithmetical mean of the products 

7,(*)V,(i/)- 

If the distributions 7„(x) are all equal (independent of v) we have simply to 
write 7(x) instead of V n (x) and V(v)V(y ) instead of V n {x)V n {y) If, in addi¬ 
tion, the distribution in the original collectives are uniform in the basic interval 
0 to 1 , one has 

y(x, y) = V g'(x)g'(y) x (1 - y) for 0 ^ x 5 y S 1 
= Vg'{x)g'{y)y{l - x) “ 0 ^ y g % g 1. 

This is the case dealt with m Smirnoff’s papers [7, 8 ]. If, finally, g'{x) is sup¬ 
posed to be equal to 1 in the interval 0, 1 , we arrive at a kernel y(x, y) whose 
Fredholm determinant is well known: 


(90) 


y{x, y ) = x(l - y) for x g y 

= y{l - x) “ x^y 


D(\) = 


sin. 


Vx 


Vx 


This supplies immediately the c f and (in form of a definite integral) the c.d f 
of the asymptotic distribution of u 2 for g 1 = 1 
The same result can be reached without the use of a(r, x) if we apply one of 
the transformations discussed in the foregoing Section Take, for instance, 
instead of y(x, y) the unsymmetnc kernel <r(x, y) corresponding to the matrix 
S = P<S> defined in (41) If all original distributions are equal, the element of 
S can be written as 

(91) s lK — Pi? ip in = ‘pXi'i* 

I (1 

Calling v{x) the density dV(x)/dx in the continuous case, the corresponding 
kernel becomes 

(92) tr(x, y ) = v(x) |V(x, y) - f <p(s, 2/M«) ds J 
With the i/'-values from (85'), g' = 1, v = 1, this gives 


<r(x, y) = a: - y + \ for x ^ y 


(920 


y_ 

2 


x ^ y. 


' It can easily be seen that the “Eigenfunctions” of this <r(x, y) are sin(Vx m x) 
with X m = mV 2 , and, therefore, the Fredholm determinant is that indicated in 

(90). • r , 

It might be added that the expectation and the asymptotic variance of u 
can be computed, independently of the distribution, from the formulas de¬ 
veloped in Part I. The results are 


(93) 


nEW | = / g'(x)V n (x)[l - V,(x)] 


dx 
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and, in the case of all V,[x) equal 

(94) n*Var{«*| jj g'{x)g'{y)V 2 (x)[l - V(y)fdxdy. 

xgl/ 

These formulas have already been given in [4]. 

Another, more general, remark is this If all V v (x) are equal, one can reduce 
the problem, by a transformation of the original chance variable x into x 1 - 
7(,r), to the case of a uniform distribution over the interval 0 to 1 If the y„(x) 
are not equal, it might still be possible to find a transformation %' = %'(x) such 
that all original distributions extend over a finite region on the m'-axia only, 
In this case the restrictions concerning the behavior of the distributions at 
infinity drop out, 
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APPROXIMATE SOLUTIONS FOR MEANS AND VARIANCES IN A 
CERTAIN CLASS OF BOX PROBLEMS 

By Philip J. McCarthy 
Social Science Research Council 

1, Summary. Consider n boxes, each box having an associated probability, 

Vi > (2 P* = l)j aQ d ai1 associated integer, k t . If balls are thrown one by one 
1 

into these boxes, the probability being p l that any one hall falls into the ith box, 
then the number of balls which must be thrown in order to obtain, for the first 
time, at least k n balls in the iith box, at least fc, 2 balls in the i 2 th box, ■ * ■ , and at 
least /c», balls m the i.th box, is a random variable, N e [h{pi), k t (p t ), • , k „(p„)]. 

Here h ,ii , ■ • , h represent the numbers of that set of s boxes, (1 < s < n), 
which first satisfies the stated condition. 

The distribution of N,[ki{pi), kiipf), ■ , fc n (p»)] can be written down for any 
set of values assigned to n, s, the pfs and the However, for n greater than 
2 the distribution assumes such an extremely complicated multinomial form 
that except for certain special cases even the mean of the distribution cannot 
be numerically evaluated without a prohibitive amount of labor. 

This paper presents the exact moments of Ni[fa(pi), kfipf)] and Nrfkiipi), h(pi)] 
in forms that readily lend themselves to computation and shows how these 
moments can be used to obtain approximate values for the mean and variance 
for certain situations where n is greater than two These approximation formu¬ 
lae are given for 

1. The mean and variance, for any n and any set of k t ’s and p,'s when s = 1 
or n. 

2. The mean, for any n and 2 < s < n — 1, when p* = 1/n, k l = fc, 

0 = 1, 2, • , »). 

Some indications are given concerning the error of the approximations, and the 
circumstances which lead to a minimum (and maximum) error. Curves have 
been prepared to show the mean for the two box case, the primary function 
of these curves being to assist in the application of the approximation formulae 
Some problems where the results of this paper might be applicable are suggested 
in the Introduction 

2. Introduction. A box problem is defined when one is given a fixed number 
of boxes, a collection of balls (either finite or infinite), a set of rules governing 
the throwing of the balls into the boxes and a statement of the conditions which 
will bring the throwing to an end. The terminating conditions usually state 
either that a fixed number of balls will be thrown or that balls will be thrown 
until a particular distribution of balls in the boxes has been obtained. In the 
first of these, interest is centered on the possible distributions which can be ob- 
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tamed, while m the latter the number of balls necessary to obtain a specified 
distribution is of primary interest. 

This paper will be concerned with certain problems falling in the latter cate¬ 
gory. In the simplest case one is given two boxes with associated probabilities 
Pi and pi and associated integeis fa and fa . Balls are thrown one by one into 
the two boxes, the probability being pi that any one ball goes in the first box and 
p 2 that it goes in the second box. This process is stopped when either fa balls 
fall in box 1 or fa balls m box 2, whichever occurs first. One is interested m the 
distribution of the number of balls necessary to terminate the throwing. This 
problem was stated m essentially this form by Laplace [4], but he contented 
himself with merely writing down the probability generating function. 

Here the special case of two boxes will be treated m detail and the results 
will then be generalized to the ri-box case. In all of these instances it is pos¬ 
sible to write down exact expressions for the mean and variance of the number of 
balls recpiired to achieve the stated distribution. However, in almost every 
case the resulting expressions are too complicated to be of any use when a numer* 
ical answer is desired. The principal portion of this paper will be devoted to 
obtaining approximate formulae from which numerical answers can be obtained 
for these problems. Some evaluation of the degree of approximation will be 
given in section 5, while curves to facilitate the computation will be given in 
section 6. 

The statement of these problems in terms of boxes and balls may lead one to 
the belief that they have no other interpretation. Actually this is not the case, 
and a few illustrations of this point will now be given For example, consider 
the curtailed single sampling plan used in acceptance sampling. A buyer re¬ 
ceives a lot of articles This lot will contain a certain proportion of defective 
items. The buyer wishes to determine on the basis of sampling whether to 
accept or reject the lot. His knowledge of his own situation will allow him to 
specify the largest proportion of defectives which he is ordinarily willing to 
accept and the risk he is willing to take of accepting a lot with a proportion de¬ 
fective larger than this critical proportion. On the basis of these two values it 
is possible to set up a sampling plan in which the buyer will take a sample of size 
n out of the lot, inspect it, and reject it if there are fa or more defectives in the 
sample. Of course once lie has obtained fa defectives there is no need to inspect 
the remainder of the sample. The lot will then be automatically rejected. 
Similarly, once he has obtained n — fa non-defectivcs, he can accept the lot with¬ 
out inspecting the remainder of the items. The average number of items which 
he must inspect in order to reach a decision is given by the solution to the two 
bdx problem stated above. Box 1 will receive the defective items, the asso¬ 
ciated integer being fa and the associated probability being pi , the true propor¬ 
tion of defectives in the lot. Box 2 will receive the non-defective items, the 
associated integer being n — fa and the associated probability being p 2 , the true 
proportion of non-defectives in the lot. 

Laplace [4] considered problems of this type as applied to games of chance, 
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Thus suppose there are two players A and B who participate m successive trials 
of a given event, the probability being pi that A wins on any one trial and p 2 
that B wins. Then one can associate the integer hi with A and h with B by 
saying that A wins the match if he wins hi trials before B wins k 2 trials and con¬ 
versely. The analysis is exactly the same as for the two box problem It is 
apparent that this same situation can be extended to any number of players. 

Another possible interpretation is as a particular kind of random walk prob¬ 
lem. Let a particle start at the origin of a system of rectangular coordinates 
and suffer successive positive unit displacements, the probability being pi that 
it moves one unit in the ^-direction and p 2 that it moves ope unit m the y- 
direction. Furthermore assume that it is absorbed if it ever reaches the line 
x = hi or the line y = /c 2 . Then the analysis of the above two box problem 
gives the mean numbei of displacements before it is absorbed. In the same 
manner, such a random walk problem can be stated for n dimensions For n 
equal to three, there will be three planes and the particle will be absorbed when 
it reaches any one of the three. 

Certain problems in public opinion polling may fit into this category of box 
problems, particularly if the above problem is rephrased so that one requires 
the mean number of trials to obtain at least h balls m the first box and at least 
hi balls in the second box, for the first time. For example, suppose that one 
desires to sample from a population composed of two types of individuals, 
A and B Let the population proportions of A and B be known and be de¬ 
noted by pi and p 2 . Then if one wishes to obtain at least fc t individuals of type 
A and at least /c 2 individuals of type B, the average number of persons who must 
be chosen in order to fulfill this condition is given by the analysis of the cor¬ 
responding box problem. This is rather artificial when there are only two cate¬ 
gories and pi + p 2 = 1. However, these restrictions will be removed m the 
course of the paper, and the problem will be considered for any number of types 
of individuals 

As a final example, consider one of the many bombing problems which arose 
during the course of war research. Suppose that a factory which is to be de¬ 
molished has n vital units, the destruction of any one of which will destroy the 
usefulness of the factory Let the probability be pi of hitting the first unit with 
a single bomb, p 2 the probability of hitting the second with a single bomb, etc., 
and assume that hi bomb hits will finish off the first unit, k 2 , the second, etc. 
Then the mean number of bombs required will be given by the analysis for the 
corresponding box problem. 

Corresponding interpretations are possible for the other problems which are 
to be considered in this paper. Some of these will be indicated as the analysis 
proceeds and it is to be hoped that others will occur to the reader. 

As previously noted, tins paper will be concerned with the distribution of balls 
necessary to terminate the throwing, assuming the p’s are known. Another 
possible interpretation is to assume the p’s unknown and to estimate them with 
the results of the ball throwing. Certain aspects of this problem for two boxes 
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have been considered by J B. S Haldane [3] and Girshick, Hosteller and Savage 

[ 2 ] 


3. Solution for the two box case. 


3.1. Distribution and moments of the number of trials necessary to obtain either 
fa balls m the first box or fa balls m the second box. This problem may be stated 
as follows: Suppose one is given two boxes with associated probabilities pi and 
Pi, and associated integers fa and fa . For the present it will be assumed that 
pi p% = fa although this restriction will be removed later Now let balls be 
thrown one by ope into these two boxes, the probability being p x that a particular 
ball will fall m the first box and p 2 that it will fall m the second box This 
process is stopped on the first ball which leaves either fa balls m the first box or 
ki balls in the second box, The number of balls, x, which is required to accom¬ 
plish this is a random variable and we desire the moments of x The probability 
that fa balls are obtained in the first box on the rrtb throw, fa < x < fa + 7c 2 — 1, 
before ki balls are obtained in the second box, is immediately seen to be 


(3.1) 


x — 1 
fa - 1 


Pi 1 ' 


l vT kl 


_ Pl " ( 


- l 

- l 


P L i i pr Ki 


Similar reasoning gives the probability that fa balls are obtained in the second 
box for the first time on the mth throw, fa x < fa + fa — 1, as 


< 3 - 2 > (*_;)«-*•!* 

From (3.1) and (3.2), the Ath moment of x, E{x h ), is 


Al+7i>2—1 / •) \ 

(3 - 3> *‘(::i) pi ‘ pr ‘‘ + 


Z-l+A.2—1 

£ 


- 1 
- 1 


pV^Pi 2 . 


However, it is inconvenient to consider (3 3) directly. A much simpler pro¬ 
cedure is to determine the increasing factorial moments of x and then transform 
these into the ordinary moments Thus the Ath increasing factorial moment of 
3, FhAhipi), h(pi)\, is defined as E[x(x + 1) • • (x + A — 1)]. Then Fn.il ] 
is equal to 


(3 4) 


E 




(x + A — 1) ! / x — l\ 
(x - 1)1 \fa - 1/ 


pYpr ki 


+ 


fcl ~t~A , 2 —1 

£ 


x<=li2 


{x + h - 1)! ( x - 1\ mk , 
(a-lH \fa-V Pl 


(3.4) can be transformed by means of the relationship 

s C t J ) pi =(i ~ p^~ <k+i)i ^ k +1.«+1), 


(3.5) 
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where I x (p, q) is the Incomplete Beta-Function as tabulated by Karl Pearson 
[6], and the result is obtained that 


fate)] - )t,(,r ‘ + 1) + K fe) 

M + htt, + D- t (h + li-i) + fe) . 

Vi 


The ordinary /itli moment of x may be written in terms of Fi,i[ ], F 2 ,i[ ],■•■> 
Fh, i[ ] as 

(3 7) E(x h ) = 

where A'Q h represents a difference of zero. Tabular values of A’O h /i' are given 
by Fisher and Yates [1]. 

In particular, the mean and variance of Z, which will receive the special desig¬ 
nations Ei[ki(pi), fc 2 (p 2 )] and cUfoXpi), fc 2 (p 2 )] respectively, are 


(3.8) 

ki T 

— i, 

Pb 

H (ki + 1, k 2 ) + — Iptifo 4- 1, ki) 

Pi 

and 




kiiki + 1) T ,, 

2 * JJi V^l 

+ 2, 7c 2 ) + ■ I Pl % + 2, h) 

(3 9) 

Pi 

P 2 



- EAlhiPi), hiPi)} - {Ei[h(pi), hiPi)}} 2 . 


In the event the p’s are equal and sum to one, Ei[lci(pi), hip*)] will be abbreviated 
to Ui[Aa , h], and finally, if both the p’s and fc’s are equal, it will be written as 
Ei[k 2 ]. In this two box situation, the only other possibility is E 2 [ki(pi), ki(p 2 )] } 
which will denote the expected number of balls required to obtain at least h 
m the first box and at least /c 2 m the second box, for the first time This problem 
will be considered in section 3.2. 

In order to facilitate the computation of mean values, both for the two box 
problem itself and for its application to problems involving a larger number of 
boxes, (3.8) has been graphed for various values of h , h , Pi and p 2 ■ A dis¬ 
cussion of this procedure and the results obtained will be found m section 6 

There is one further result which will later prove useful. Consider the situa¬ 
tion when there is only one box with pi and h , pi < 1 This is the same as 
having two boxes where the k 2 corresponding to the second box is infinite In 
other words, one can teiminate the throwing of balls only because of what hap¬ 
pens to the first box, never because of anything that happens to the second box. 
In this case one obtains 


EAhipd, «> (p 2 )] 


= £ 

x=*k i 


/ x — 1 
: \fcl -1 


p\ 1 pT h 


fcl 
Pi' 


(3.10) 
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Similarly, 

(3.11) <n[ki(pi), m (ps)] = ^jr 5 

Pi ■ 


3.2. Distribution and moments of the number of throws necessary to obtain at 
least fa balls m the first box and at least fa balls in the second box. This problem 
may be stated as follows: Suppose there are two boxes with associated probabil¬ 
ities pi and p 2 , and associated integers fa and fa . As m 3 1 , pi fi- pi = 1 , 
Let balls be thrown into the boxes one by one, the probability being that a 
particular ball will fall in the first box and p 2 that it will fall in the second box. 
This process is stopped on the first ball which leaves at least fa in the first box 
and exactly fa in the second or at least fa in the second and exactly fa in the first. 
Again x is the number of balls required to accomplish this. As explained m 
3 1, the mean value in this case will be written as S 2 [/ci(pi), ( 702 )]- The analysis 

follows through as in 3.1 and the mean number of trials is equal to 


(3.12) 




z 



Vi Vi 


as-~fci 


oc 

+ Z X 



Z-J.J lj 

Pi Pa 


Making use of (3.5), this can be written as 

& 

(3.13) — [1 - I n (fa + 1 , fa)] + - [1 - I Pl (fa + 1,7ci)]. 

Pi P 2 

Referring to (3.8) it is evident that 


(3.14) EiMVi), h{pd] fi- Ei[fa(pi), faipt)] = — fi- —. 

V 1 P2 


The hth increasing factorial moment in this problem, denoted by FhAHpi), 
faipi)], is 


Hh + 1 ) • • • (fa + h - 1 ) 


(3.15) 


p’l 


fi¬ 


ll - I Pl (fa + h, fa)] 
kijfa fi- 1 ) • ■ • {ki fi- h — 1 ) 


Pi 


[1 — Ip 2 (fa fi- h, fci)] 


Comparison of (3,15) with (3.6) gives the relationship 

MI + rui- h( *‘ + 1) - ; ?- + ' , - 1) 

(3.16) 

. Ufa fi- 1) ■ • • (A-i fi- h - 1), 
+ V\ ■’ 

The ordinary moments of x can be computed from (3.15) by the use of (3.7). 
That is, formula (3.7) holds in this case if PVi[ ] is replaced by py 2 [ ]. 
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It can be easily shown by the use of the recursion relationship for the Incom¬ 
plete Beta-Function, 

I*(P, Q) = - 1, q) + (1 - x)I x {p, q - 1), 

that Fh, i[ ] and F hfi [ ] satisfy the partial difference equation 

Fh, .[fci(pi), /c 2 (p 2 )] = hF h ^ ltl [ki(pi), 7c 2 (p 2 )] 

(3.17) + pi F hlt [(ki - l)(pi), fe(p 2 )] 

+ V^kAHvi), % - l)(p 2 )], 

where % = 1 or 2. This equation can be used as an alternative way of obtaining 
many results, examples of which are (3.10) and (3.11). Certain of these appli¬ 
cations have been discussed by McCarthy [5]. 

4. Solution for the n box case. 

4.1. Preliminary discussion. The problems of this section, although direct 
generalizations of the two box cases, can perhaps be most easily stated and 
illustrated as applied, to the behavior of a random particle Suppose that 
we have a random particle winch starts at the origin of n-dimensional rectangular 
coordinates and moves in unit steps along the positive coordinate axes. At 
any given point the probability will be taken as p, that it moves m the r.-direc- 

n 

tion. X) P* is assumed to be one unless otherwise specified Now consider the 

i-i 

n hyperplanes, %t = k x , and assume that the particle will be absorbed if it passes 
through a specified number, say s, of these hyperplanes. Notice that we are 
interested only in the number of planes which it passes through, and not m the 
particular ones. For each s, (s = 1,2, , n ), the number of moves which the 

particle makes before it is absorbed is a random variable, and in this section we 
will be concerned with the distribution of this random variable. The cor¬ 
responding interpretations for boxes and balls is immediately obvious. 

These problems are seen to be generalizations of the two box cases considered 
in section 3. Although it is always relatively easy to write down formal ex¬ 
pressions for the quantities to be considered, the step from two boxes to three 
or more boxes produces expressions which are extremely difficult, or even im¬ 
possible, to evaluate. In this section we shall develop approximate solutions 
which make use only of simple computations based on the solution for the two 
box case. 

As an introduction to the contents of this section, we shall discuss briefly a 
box problem which is a special case of the general problem. Assume that there 
are n boxes with a probability of 1/n that any one ball will be thrown into a 
particular one of the n boxes Then one can ask for the mean and variance of 
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the number of trials required to obtain s occupied boxes (i.e. fci = hi ~ ... = 
k n = 1). Making use of (3 10) and (3.11), we obtain 


m n \ = i 



E,[ 1"] = 1 + 


n — 1 


+ • • ■ + 


n 


n — s + 1 




,o n — i 


and 


<Tl[l"] = 0 
<ra( 1”] — 0 + a\ 


crStn = o + 




+ 


(n — l ) 2 


i 2 

+ O'! 


(n - l ) 2 


(4.2) 


-0 + ^.+ 2 " 


{n - l) 2 ' (n - 2) 2 


or#[l W ] = 0 + 


(» - l) 2 


+ 


2 n 


(n - 2) 2 

+ ■ ■ • + 


(s — 1 )n 
(n — s + l) 2 


= n£ 


(n — i) 2 


The solution for this problem for s = n is given in Uspensky [9], but a straight¬ 
forward solution requires a great deal of formal manipulation. The step-by- 
step procedure used here is somewhat indicative of the methods to be used in the 
succeeding portions of this paper. 

4.2. Mean and variance of the number of trials required to obtain either fcx balls 
m the first box, or k 2 m the second, • ■ , or fc„_i m the (n — l)sf, the probability 
associated with the nth box being non-zero. The mean number of trials in this 
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particular problem is represented by • ■ ■ , co (p„)]. The 

formal expression for this quantity is 


(4.3) 


n —1 « 

SSj 

l=l 3=&i 


0 - I) 1 




fe - 1)! 0 - 


xL 


0 - h)\ 


. . /M 7 ;*—1 ✓ n ' r * + l, * . / n r n 

yi 0 Vi-1 Pi-fi * P» > 


r i !• • -n-i! n +1 1- 

where the third sum is taken over all values of the r’s such that 


r*Ti^ r »+i. 


r i + • • * + ’’.•-l + >*i+i + • • ■ + r n = j — k, 

and 

?*i ^ j ‘■ i ^i—i k\—i , Pjpi /-'j-j-1, * ■ * , rn— i ''C k n —\ . 

This expression can be reduced by one dimension by the application of some of 
the results for two boxes Consider for the moment only those balls going mto 
the first (n — 1) boxes. Then the number of balls (conditional) which is neces¬ 
sary to obtain either hi in the first box, or h in the second, • • ■ , or /c„_i in the 
(n — l)st box is a random variable X which takes on values 


ki, fci + 1, • • • , fci + M • • • + k„-i — (ft — 2) 

with corresponding probabilities ir,, where with no loss of generality it is as¬ 
sumed that 7ci < fa < • < fc»_i. 7r; is given by a sum of (ft — 1) multinomial 


/(l‘4 


expressions, the probability associated with the fth box now being p 

which will be designated by p\ . 

Under these circumstances it is apparent that 

(4 4) Eilklip i), • • ■ , /on—i (pn—i) , «= (p«)] = EAxj(pi 4-+ p B _i), °o (p n )J. 


However, (3.10) can be applied to each term in (4.4), leading to 

_ 1 

(Pi + P2 + • + Pn-l) , 


(4.5) 


7T, Xj . 


Now from the definition of w, and Xj we have 

El[h{Pl), , fcn-l(Pn-l), °° (Pn)] 

(4 6) 1 


(Pi + Vi + ’ • • + Pn-l) 


Ei[ki{p[), hip's), ■ • ■ , kn-lCpn- J). 


Similarly, the application of (3.11) gives the result that 
m[/ci(pi), •• ,k n - i(p n -i), oo (p„)] 

( 4 " 7 ) Vn 


(pi + P2 ~t~ ‘ • ■ + Pn-l) 


EAklip'i ), • • • (pLi)]. 
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These results are of immediate importance for two reasons; 

1. They indicate that by combining boxes and introducing a new random 
variable, certain problems can be simplified. This statement will be expanded 
and the principle applied repeatedly in the later portions of this paper. 

2. With respect to the section on two boxes, they mean that the restriction 
Pi + P 2 = 1 is not necessary for the solution of the problems. One can always 
assume that p,( = 1 — pi — pf) refers to a box which receives balls but which 
otherwise has no effect on the outcome of an experiment In this paper it has 
been convenient to refer to such a box as having an infinite capacity. 

4.3. The mean value and variance of the number of trials required m a two box 
'problem when one or both of the constants fa and k 2 are replaced by random variables. 
The discussion m 4 2 has indicated that the idea of associating a random variable 
with a box instead of a single integer may sometimes lead to simplification 
Here this procedure will be treated in more detail. Consider 2?j.[fci(p0, fa{pf] 
and assume that fa is replaced by a random variable X which can take on values 
Xi, x 2 , ■ , x t with corresponding probabilities rri, ■ T t . Under 

these circumstances ®i[ ] itself becomes the random variable Ei[X(pf), fa(pf)], 
taking on values UiMpi), /c 2 (p 2 )], (i = 1, 2, ■ , t), with corresponding prob¬ 

abilities t, . The mean value of this new random variable can be formally 
written down as 

t 

(4.8) E{Ei{X(pi), /c 2 (p 2 )]) = D Ti#i[a\(pi),/c 2 (p 2 )J. 

i—l 


This expression can always be calculated from the probabilities 7r, and (3.8) 
or from the curves given in section 6. However, in the applications which will 
arise later in this paper, this computation would be very time consuming. In¬ 
stead, an approximation to (4.8) will now be derived which will prove to yield 
very good results, and which can be obtained by a simple reading on the above 
mentioned curves. 

If X is regarded as a continuous variable, then Ei[X{pf), fa{pf}\ is a con¬ 
tinuous function of X, and, in fact, can be represented by a single curve similar 
to those appearing in section 6. Moreover, as is apparent from (3 8), repeated 
differentiation of Ei[X(pf), fc 2 (p 2 )] yields continuous derivatives. Consequently, 

Ei[X(pi), fa (p 2 )] can be expanded m Taylor series about a, where o = X) mXi . 

i=l 

This procedure gives 

(4.9) E(E 1 [X(p 1 ), kM) =t»,t (,r ’ ~ Eila( Pl ), fa(pf)], 

1=1 !_0 J 1 

where £u[a(pi), k 2 (p 2 )] represents the jth derivative of E x [X(pO, fe(p 2 )] with 
respect to X evaluated at a. Interchanging the order of summation one ob¬ 
tains 


( 410 ) 


^ WpO.hfo)] ± Vi(!B% -ay 

3-0 J I 1-1 
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The final result then becomes 

(4.11) Emx ( Pl ), h 2 (p 2 )]) = E ~ ir - (pi) ; hiPi) \ fx,, 

where n } is the yth moment of X about its mean, a. Thus to a first approxima¬ 
tion 

(4.12) E(Ei[X(pi), 7c 2 (p 2 )]) c^> Si[a(pi), hi(pi)]- 

It is of interest to note that if Ey[X(pi), fe>(p 2 )] is linear in X then (4 12) is an 
exact expression since all derivatives except the first are zero. Furthermore, 
if Ei[X(p x ), k 2 {p 2 )] is of the second degree m X, then only the second non-zero 
term on the right hand side of (4.11) needs to be added to (4 12) m order to 
make it exact. The former of these is the relation which gave an exact solution 
in 4.2. 

It is important to realize that this analysis for E (Ei [X(pi), fc 2 (p 2 )]) can be 
immediately applied to E(E 2 [X(pi), hipi)])- For, by the use of (3.14) and 
(4 8), one obtains 

(4 13) E(E 2 [X(pi ), Up*)}) = 7 + £ - Emx(px), 1M) 

Pi Pi 

The same analysis can be applied to Fh,i[ ] and the general result obtained 
that 

(4 14) E{F kJ [X(pd, fe(p0]) F h Mpi), Up*)}. 

This immediately allows one to approximate the variance in the obvious manner. 

It is of interest to consider briefly the situation when both k% and fc 2 are re¬ 
placed by random variables Let ki be replaced by Xi taking on values x n , 
Xu , • • ■ , xu with probabilities iru , ttu , ■ , tu and k 2 be replaced by X 2 

taking on values Xu , Xu , • • • , x 2a with probabilities ni, w , ••• , v 2a . Then 

(4.15) E(EAXi(pi), X 2 (p 2 )]) = E TruTruEAxuiPi), 

1.7 

where 1 = 1, 2, ■ • ■ , t and j = 1, 2, • , s Again applying Taylor series 

and expanding about a = E vu-Xu and b = E > the result is obtained 

i 3 

that 

(416) EmxAvi), xm) = E £ ’ r[a i p , l) ;, b(p2) ' 

where Ei v [a(pi), t(p 2 )] is the uth partial derivative with respect to Xi and 
the rth partial derivative with respect to X 2 of Ei[Xt(pi ), X 2 (p 2 )] evaluated 
at Xr = a, X 2 = b. This gives the approximate formula 

EiEAXiipx), X 2 (pi)}) c* Ei\a{pi), b(p 2 )]. 


(4 17) 
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4 4 Mean and variance of the number of trials required io obtain either (at 
least) hi balls m the first box, or (at least) 7c 2 balls m the second box, ■ • , or (and 
at least) k n balls m the nth box In accordance with previous notation the mean 
number of trials required is given by Ei[7ci(pi), • , k n (p n )]. The exact 

value of this quantity can be written down and it would be a complicated multi¬ 
nomial expression The evaluation of such an expression would be extremely 
difficult, if not impossible, especially for large values of fa, fa, • , k n . In 
order to obtain an approximation to Ei[ ], repeated applications of (4.12) can 
be made and the resulting expression can be evaluated by means of the curves 
in section 6. 

For convenience, consider Ex\fa(jpi), fa(p 2 ), fa(pf), 7^(p 4 )]. The general 
result will then be apparent Assume that the first three boxes form a single 
unit with probability (pi + p 2 + Ps) Then the number of balls required to 
obtain either 7ci in the first, fa in the second or fa in the third, if all balls are going 
in these three boxes, is a random variable X Consequently, 

14.18) Ei[fa{pf), • ■ ■ , fa (pi)] = E(Ex[X(pi + pi + pf), hi(p4)]). 

Applying (4,12), 

-Ei[7ci(pi), • ■ ■ , faipd] c* 

(4.19) 2 ‘[ i ’‘[ i '‘C. + K + P.), fe 

(pi + Pi + Pa), fa(pi) 

Applying (4.12) once again the final approximation is 


Vi \ k f Pa V 

,Pl + P2 + Pa/ ’ 3 \Pi + Ps + P 3 /_ 


Ei[fa(pi), , fa(pi)] =* 


. Ei Ei fa 


Pi 

Pi + Vi 


\ , / Va W ( Pi + P 2 \ 

/’ 2 \Pi + P2/j \Pi + Ps + Pa) ’ 

p s \"1 

I + p 2 + Pa/J iVl JrVi + V%) ' kiiPi) _ ' 


\Vi + P2 + Pa/ 


Expression (4.20) can be translated into a course of procedure. One considers 
the first two boxes and computes 

L \Pi + Pa/ \Pi + P2/ J 


It is then assumed that cp is a new number associated with a box with probability 
(Pi + pi) and 


0/i — E\ 
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Repeating tins procedure again, one computes a a = Ei[a 2 (p, + p 2 + p 3 ), /c 4 (p 4 )], 
and by (4 20) this is approximately equal to E^faipf ), • • • , /c 4 (p 4 )l. Tins method 
of computation is seen to be completely general and one can apply it to any num¬ 
ber of boxes Each step consists of computing Ef ] for two boxes and con¬ 
sequently can be carried out with the curves of section 6 It is evident that 
the order in which the boxes are taken may have an important effect on the size 
of the error involved in using this step-by-step procedure. This problem will 
be considered m section 5. 

It is of interest to note that one can also obtain another approximation for 
Ei[fci(pi), faipf), fa(pf), faipf)] Suppose that the first two boxes are con¬ 
sidered as one unit and the second two boxes as another unit Then the num¬ 
ber of balls which must fall m the first two boxes in order to obtain either fa m 
the first box or fa in the second is a random variable Xi . Similarly a random 
variable X 2 can be associated with the last two boxes. Accordingly 


(4.21) Ex[7ci(pi), • • , fa(pf)\ — E(E\\_X\{'pi + p 2 ), X 2 (p3 + p 4 )]) 
By use of (4.17), (4.21) can be written as 


Eife(pi), 


fa 


(4.22) 


P2 


■““‘Hit)' 

?i r fa (—j— ), fa (■ , 

L \P3 4- pj \Pi + Pi 




Pi + Pa/J 
Vi 


(Pi + Pa), 


(Ps + Pi 


]■ 


This same analysis applies directly to the factorial moments. In particular 


E 2l i[fci(pi), • • , fa{pi)\ Fi,i 


(4 23) 



fa 



Pi + Pa \ 

Pi + Pa + Pa) ’ 

i (Pi + Pa + Pa), fcd(p 4 )]. 


From (4 20) and (4 23) an approximate value for o-llfai.pi), fc 2 (p 2 ), faiva), fa (p 4 )] 
can be obtained. This procedure is also perfectly general and so an estimate 
of tri[ ] can be obtained for any number of boxes 

This same method can be immediately applied to the approximation of 
Et»[fci(pi), • , fcn(Pn)]- One simply considers the boxes two at a time, comput¬ 
ing Ei[ ] at each stage instead of Ei[ ]. 

4.5. Solution for E s [k n ] and E,[ki~ l , fa]- When s is different from 1 or n, 
the complexities of the problem force one into the consideration of only the 
quantities given in the title of this subsection. The corresponding problem 
for three boxes, namely X 2 [/ci(pi), /c 2 (p 2 ), fa{pa)], has been treated for general 
fc; and pi by McCarthy [5] However, the resulting expression is so complicated 
that it will not be given here. 

The process to be used consists of reducing the subscript s by a series of steps 
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until the subscript 2 is reached, This expression can then be evaluated by the 
use of the curves or by simple computation. For the sake of convenience, the 
case EJJt] will be considered in detail. It will then be possible to write down the 
expression for general s and n. 

As a starting point, look upon the first three boxes as a single unit Then 
there is a definite probability *-,• that one of these boxes will have k balls in it for 
the first time on the a;,th throw into these three boxes and that the other two 
boxes of the unit will each have less than k balls. Then if one of the other of 
the three boxes has u balls (it < k) the third box will have (m — k — u) balls, 
(a,. — k — u <k). Meanwhile the fourth box will also have been receiving balls, 
and the number in it at this time will be denoted by j, (j = 0, 1, 2, ••,«). 
For each x t there is a probability associated with u, namelj'- P(u | x,), and another 
probability associated with j, P(j | For the moment, consider that box 
1 has received k balls, box 2 the (x t — k — u) balls, box 3 the u balls and box 4 
the j balls. This numbering is of course immaterial since the situation is sym¬ 
metric with respect to the first three boxes. 

Now if y > Ic, either (2 k + u — x t ) balls will be required in the second box or 
{k - u) balls in the third box in order to obtain three properly occupied boxes. 
On the other hand, if j < k, the specified number will be required in any two of 
boxes two, three and four. Consequently, with this conditional description of 
the situation, the required number of balls necessary to obtain three out of the 
four boxes occupied in the proper manner is 

(4 24) %i + } 4- Pi [(2/c + u — x,), (k — u), (k — y)‘], 

where (h — j) will be taken as zero if j is greater than or equal to k. From this 
description, it is evident that the desired mean value may be obtained by sum¬ 
ming (4.24) over all possible values of »,, j and u. Therefore 


(4.25) 


Alik*} ~ 12 Ti {.r, + 'll P(j I -O 

j-o 


• (j + S P(u | Xi)Ei[(2k + u - x,), (k - u),k - j )]^ 


It is to be noticed that the probabilities inside the E^[ ] in (4.24) and (4 25) 
do not add to one but only to 3/4. This can be easily remedied by the applica¬ 
tion of a formula similar to (4.6) and the result is obtained that 


(4.26) 


Falfc 4 ] = 2 in P(j | X t ) 

j-o 


• (j + 4/3 £ P(u I -OWic + u - x,), (k - u ), (k - j )]) 


where each probability inside E 2 [ ] is now 1/3. 
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By simple considerations 
(4.27) 


(x t — k)l 


P («|*0 = 

^_(*< — fc)! 


{ L T{ xy<-*-» 
'IWl V s 


v*'* "V ’ /iWAY 

' r!(^ — ft — «)i ^ ^ 
where w and (x t — lc ~ u) are both less than 7c, and 

(4.28) P(j | aO = - ( ^ +i ~,:? ‘ (i)”(iy. 


(*. - i)!; 1 

E j^O I a:,) = s,/3, 


E uP(u |aO = 


From (4.27) and (4.28) 

(4.29) 

and 

(4.30) 

(4.25) can be written as 

®3[7c 4 J = E TTtZ* + E «■< E I *.) 

t l 3 

(4.31) 4 _ 

+ 5 E "V E -PC? I ^,) E P(u I a;,)E 2 [(27c + « - rr,), (k - u), (7c - j)]. 

u t ) u 

Finally, making use of (4.29), (4.30), the definition of x t and w t and the procedure 
of replacing random variables inside an E 2 [ ] by their mean values, 


(4.32) JS 3 [fc 4 ] ^Eit/c 8 ! + Et 
o 




and tlus in turn can be written as 
(4.33) 22 3 [* 4 ] ~ | |^x[7c 8 ]+ E 2 

This method of analysis which has just been applied to E 3 [k'\ can be used 
equally well for E,\k n ]. Here one simply considers the first (n — 1) boxes and 
proceeds as above The final result is immediately apparent, namely that 


[(■*-¥*)■• (*-¥)] 


E.m 


(4.34) 


n — 1 


BiViT 1 ] + 


It will be noticed that m reducing (4.34) further it will be necessary to consider 
expressions of the form E,[ki~\ 7c 2 ]. However, it will be seen from the foregoing 
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analysis that no use was made of the fact that the integers attached to the first 
(n — 1) boxes were the same, Accordingly, 


E,[kr\ fe] c* 


n 


(4.35) 


■ 1 . 
•E»-i 


^it/cr 1 ] + 


~ ( n 1 T, 

EM 

1 

£ 

T 

Kl 

W^lXTl 

l o h 

L\n — 2 

~ n-2) M 



Now, by the use of (4.34) and (4.35), it is possible to reduce s as much as may be 
desired. 


6. Some considerations concerning the error of the approximations. 

5 1. Preliminary remarks This discussion of the errors of the approximations 
given in the preceding sections has been left until now so that a broad perspec¬ 
tive might be gained, and the errors seen m relationship to one another. Such 
an arrangement is advantageous in this instance since both the analytical and 
computational results bearing on the subject are scanty, and consequently, 
any intelligent leads which their inter-relationships can give are most helpful. 

The difficulty involved in obtaining exact values for the various quantities 
considered in this paper has been pointed out quite frequently, and the approxi¬ 
mations have been devised to overcome this very difficulty The same com¬ 
plexity which prevents the computation of many exact values also prevents any 
effective analytic approach to the problem of evaluating the errors. For these 
reasons the author has been unable to carry through any general analytic treat¬ 
ment of the errors of the approximations. However, because the intelligent use 
of approximations requires some knowledge of their accuracy, certain isolated 
cases have been investigated by a combination of computational, graphical and 
analytic methods These investigations are detailed m the remainder of this 
section, and conjectures concerning the general behavior of the errors are made 
whenever possible. As has been stated earlier, no consideration will be given 
to the approximation formulae for the variance 

5.2 Errors of the approximations for Ei[fa(pi), • ■ • , k n (pn)] and 

E n [h{pi), , k n {pn)}- 

Taking n equal to 3, we have from (4.11) that 

| ®ifti(Pi), faipi), k s (pt)] - Ei[a(pi + pi), h(pi)) | 

L \Pi + Pit \Pi + Vi) A 

■ Max | E\[X(p 1 + pa), fa (p 3 )] ], 

where Max I E\[X(pi + p 2 ), fa(p 3 )] | is the maximum absolute value of the 
second derivative of E^Xipi -f- pi), fa(p 3 )] with respect to X, and a is equal to 
Ei[ki(pi/(pi + Pi)), fa{pi/(pi + p 2 ))]- Now an examination of the curves 


(5.1) < | a\ 
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given in section 6 indicates that, for fixed p 3 and fc 3 , the maximum curvature of 
Bi[X(pi + jh), hipa) J, considered as a function of X, is a monotone decreasing 
function of fc 3 Since this curvature is negative, this geometric observation is 
equivalent to 


(5.2) 


Max | El[X(px + p 2 ), (h 3 + l)(p 3 )]| 

< Max | + p^, fcaCpa)] |, 


although it is not necessarily true that 


+ P2), (&3 + l)(pa)] [ < | -Eifccifpi + P2), /c 3 (pj)] 


Moreover, 


(5 3) EilMpi), hips), ka(ps )J < Exlhipi), k 2 (pi), (ft, + l)(p 3 )]. 

From (5 1), (5.2) and (5 3) one readily obtains that the absolute value of the 
percentage error of the approximation to f?i[A;i(pi), /c 2 (p 2 ), ft 3 (p 3 )] is bounded by 
a function, say f/iIMpi), k 2 (p 2 ), fc 3 (p 3 )], which is a monotone decreasing function 
of k 3 as h increases. It should be noticed that the results of 4 2 have already 
shown not only that this upper bound for the percentage error approaches zero 
as fc 3 becomes infinite, but also that the absolute difference between the true and 
approximate values approach zero as fc 3 becomes infinite. 

Computation of Z7i[/ci(pi), /c 2 (p 2 ), h{p 3 )] is very time consuming because of the 
difficulty in obtaining Max | £ , i[X(p i + p 2 ), h(pa)] | , and because the direct 
computation of h\[fci(pi), /c 2 (p 2 ), k(p 3 )} is laborious when any of h , h and k a 
are much larger than 2 or 3. In order to surmount these difficulties and still 
give some indication of the behavior of Uilhipi), h{y*), h{p a )], the following 
expedients were adopted: 

1 The values of fci , k 2 and k 3 were each fixed at 5, 

2. Max | EliXip! + p 2 ), /c 3 (p 3 )] | was obtained by graphical means, namely 
drawing the slopes of the appropriate curve m section 6, graphing these slopes 
and then taking off the maximum slopes of these curves. 

3. Ei[ki(pi ), ki(pi), ks(pa)] was replaced by its approximation, 

E 1 [a(p 1 + Pi), h(pa)], 

in the computation of the percentage error. This new bound will be denoted 
by Ut[h(pi), /c 2 (p 2 ), fc 3 (p 3 )] 

4 . Carefully chosen values of U*[ki(pi), fc z (p 2 ), h(ps)] were plotted on trian¬ 
gular coordinates, and contour lines mtei’polated and extrapolated to cover in 
large part the range of pi , p 2 and p 3 

The use of the third of the above listed assumptions is no detriment to the 
usefulness of the results since 


M - Eil] 
Ei[] 


1 


E la \] - EA\ 

Eia [ 1 < 17i[fci(pi), h(p 2 ), fc 3 (p 3 )] 

Eia[] - [] ~ 10 0 - U*[ki{pi), * 2 ( 3 ^ 2 ), kaips)]’ 

W] 
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i ] = + / 2); ks(p * )] and Ei[ ] = wi 

bince C/a ] is a monotone decrease function of k s , this new bound on the per 
centage error is also monotone decreasing for increasing k 3 , Absolute values 
were not required in tins derivation since B la [ ] is always greater than or equal 
to Ei[ ], as is apparent from (5.1) and an examination of the curves of section 
6, The contours of Ui [5(pi), 5(p 2 ), 5(p s )] are shown in Fig. 1 The interpreta¬ 
tion of this figure is very straightforward For example, for p 3 <C .5, the value' 



Fm 1. Contours or F*[5(pi), 5(p 2 ), 5(p a )] Considered as a Function 

OP Pi, Pl AND p s 


S (ps), 5(p 0 )] is less than 5.0% Making use of the d efini tion of 
Ui[ ], and especially its monotone characteristic, one can then say: the ap¬ 
proximation for ^[5( Pl ), 5(p 2 ), h(p 3 )}, where h > 5, p 3 ^ .50 is in error by not 
more than 5.3% Moreover, as has been already observed EJia{pi + p ! ) l 
h(p3)] is always greater than or equal to ®i[fci(pi), h(p 2 ), A s (p 3 )] 

It will be noticed from Fig, 1 that t/*[ ] is increasing steadily as ps approaches 
1. It has been demonstrated by McCarthy [5] that this behavior of the upper 
bound does not mean that the percentage error itself becomes larger as p 3 ap- 
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proaches 1. As a matter ot fact, for fixed fa , fa and fa , the percentage error 
approaches zero as pi approaches 1. However, this demonstration does not 
furnish any reasonable bounds with which to fill m the lower left hand corner of 
Fig 1 This fact is not as serious as it may at first seem because there is nothing 
to prevent one from reordering the boxes For example, consider Z?i[5(2), 

5(.2), 5(6)]. From Fig. 1, the error of the approximation for this quantity, 
namely Ei[Ei[5{ 5), 5( 5)](.4), 5( 6)], is not more than approximately 

7 5/(100 - 7.5) = 8 1%. 

On the othei hand this same figure shows that Fi[I?i[5( 25), 5( 75)]( 80), 5(,20)], 
which is also an approximation to Hi[5(.2), 5( 2), 5(.6)], is in error by not more 
than approximately 8% Consequently one would choose the second ordering. 

The procedure which has been used to obtain an upper bound on the percent¬ 
age eiror of the approximation to E^hipi), fa(pa), hips)], fa and fa fixed and 
fa greater than or equal to that integer at which the bound is evaluated, can also 
be applied to H a [fa(pi), fa(p 2 ), fa(p a )] All the assumptions remain the same 
and in this case the hounds corresponding to Uf ] and lit[ ] are denoted by 
17 S [ ] and [/*[ ]. As in the case of Ui[ ], we have 

FM] — Fab[] 

Fall - E 3b [] _ Ea[] ^ H*[fafpi), hjp 2 ), fa(p 3 )] 

M] 1 , EA] - E»[] ~ ioo 

EM 

Here the approximation, E 2 [b(pi + p 2 ), fa(p 3 )], is always less than or equal to 
the exact value, F a [fa(pi), fa(p 2 ), fa(p 3 )]- The contours of I/*[5(pj), 5(p 2 ), 5(p 3 )] 
aie shown in Fig 2 In using U*[5(pi), 5 (p 2 ), 5(pj)] it is sometimes advan¬ 
tageous to reorder the boxes. For example, consider F 3 [5(2), 5(2), 5(.6)]. 
Fig 2 shows that, as an approximation, E 2 [E 2 [ 5( 5), 5( 5)]( 4), 5( C)] is in error 
by not more than approximately 9% However, E 2 [E 2 [ 5(25), 5(,75)](.80), 
5( 20)], which is also an approximation for Ei[ 5( 2), 5(,2), 5(.6)], is m error by 
not more than about 7%. There is a gain here, but it is not as great as the cor¬ 
responding situation for Fi[5(.2), 5( 2), 5( 6)] 

As has already been stated, one may minimize the error by correctly choosing 
the two boxes which are to be combined first. Some discussion will be given 
here of a procedure for choosing these two boxes Of course an experimental 
scheme may be used which makes use of the fact that the approximation to 
Eilhipi ), h 2 {p 2 ), k 3 (p 3 ) ] is always an overestimate. In other words, that grouping 
is used which gives rise to the smallest value of the approximation. However, 
this can be replaced by a few preliminary computations 
As can be seen from (5 1), the error of the approximation depends upon two 
quantities, namely the variance of the two box situation obtained by combining 
two of the boxes, and the maximum value of the second derivative of the curve 
representing the function Ei[X(pi + pf), fa(p 3 )] over the proper range of X 
values. The error will bo zero of Ei[X(pi -j- p 2 ), fa(pa)] is either a constant or 
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linear in. X over the range of X values in which one is interested, that is k x < 
X < h + ki - 1, h ^ fc 2 . If this is not possible, then one wishes to makeTt 
asjnear so as possible, subject to the restriction that .. u 

<rl[h(pi/(pi + p 2 )), hipi/ipi + p 2 ))] 
is not unnecessarily large. 



15 . 0 # 10 . 0 $ 


Fig 2. Contouhs or t^*[5(pi), 5(p 2 ), 5(p 3 )] Considered as a Function 

of p lt p a and p 3 

An indication of the relationship between the boxes for both linearity and con¬ 
tribution to variance can be obtained from expressions (3.10) and (3.11). Thus 
for each box one computes k t / p* and k,( 1 — p % )/pi , Then in order to most nearly 
achieve linearity one orders the boxes in accordance with the increasing order of 
fc»/p, and combines them in that order, If there is a tie between two or more 
boxes with respect to the ki/p l ordering, then one orders these “tied” boxes in 
accordance with increasing fc,( 1 — p t )/pl. 

Some computations have been carried out to illustrate these points and they 
are given in Table 1. Themotation ((2, 4), 6) means that one first combines the 
boxes with integers 2 and 4, and then combines this result with the box with 
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associated integer 6. All values in this table were obtained by direct computa¬ 
tion No use of the curves was made. 

In these three situations, one obtains the values given in Table 2 
Thus in the first case there is nothing to choose with respect to hjp l , but 
7ci(l — P>)/Pi indicates the ordering ((6, 4), 2). Actually the percentage error 
in this instance is 1.0 as compared with 1 7 and 2 4 for the other two orderings. 
In case two, k x /p, indicates the ordering ((2, 6), 4) Although this does not 
turn out to be the best ordering, Table 1 shows that the ordering in this instance 
makes little difference. In the last case, the indicated ordering is ((2, 4), 6) 
and the percentage error for this is zero, as opposed to 1.3 and 1.6. Since at 
any stage in the operation of combining boxes two at a time (4.13) holds, the 

TABLE 1 


Effect of Order of Combination on Error of Approximation 


Pl 

V« 

ki 

V} 

Vs 

Jc 2 

P3 

fcs 

•Ei[fci(pi), ki(pi), k,(p,)] 

% Error of Approximation 

Order of Combination 
((2, 4), 6) ((2, 6), 4) ((4, 6), 2) 

2 

4 

6 

’6.96 

+ 1.7 

+2.4 

+ 1.0 

4 

6 

2 

3.92 

+0.3 

+0.5 

+ 0.5 

6 

4 

2 

3.77 

+0.0 

+1.3 

+ 1.6 


TABLE 2 


Pi 

1/6 

1/3 

1/2 

1/6 

1/3 

1/2 

, 1/6 

1/3 

1/2 

h 

2 

4 

6 

4 

6 

2 

6 

4 

2 


12 

12 

12 

24 

18 

4 

36 

12 

4 

H- 1 

1 

3* 

60 

24 

12 

120 

36 

4 

180 

24 

4 


above procedure will also give the minimum error for the approximation to 
Esikiipi), kzipf), baCpa)] Moreover, the approximation for this quantity -is always 
an underestimate of the true value, and therefore that ordering should be taken which 
gives the greatest value for the approximation. 

When the error of the approximation to Effkffpf), ■ , k n (p n )] and 

E n [h(pi), ■■■ , k n (p n )], 

for n greater than three, is considered, it is immediately obvious that the general 
considerations already given m this section still apply In addition to these 
considerations, there is the difficulty that errors may cumulate. However, the 
results already quoted for three boxes, in conjunction with those which are to 
be given in 5.3, indicate that this cumulation is not serious There are two 
factors which eventually prevent (i e. as more and more boxes are considered) 
this percentage error from becoming unduly large, ancl, in fact, make it approach 
zero. These are: 

1 . The value of p 3 will, m most instances, be decreasing as more and more 
boxes aTe considered (see Fig 1), and 
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8 . The true value ia usually becoming larger and larger as more and more boxes 
are considered 

In order to minimize the error, the following precautions should be taken' 
1 . At each stage in the computation, try to avoid, as much as possible, making 
readings where + Pi), h{p'z)\ is curving sharply. If all leadings are 

made where the curves are nearly linear, the percentage error will be very close 
to zero, On the other hand, if many reading's must be made wheie the slopes 
of the curves are changing most sharply, larger errors must be expected 
2 Use that ordering of the boxes which provides the minimum value for the 
approximation to Ed ] or the maximum value for the approximation to E„[ ]. 

S In order to approximate the ordering which (2) would give, compute 
k t /p t and h,(l — pf)fp\ at each stage at which two boxes are to be combined 
and use the rules of procedure already given for three boxes. 

5,3, Error of the approximation for E s [&'‘]. Repeated applications of the re¬ 
duction formulae (4,34) and (4 35) allow one to evaluate E,\k n ) by means of the 
solution for the two box case, or more explicitly, by means of the curves given in 
section 6. Here the error of this approximation will be discussed primarily from 
a computational point of view. 

E„[l n ] can be treated in detail since it is possible to obtain exact values for this 
expression by means of (4.1). This has been done by McCarthy [5], but the 
details will not be repeated here because of lack of space. The results simply 
add more credence to the conjectures which will soon be made. 

When k is taken to be larger than orle, the difficulty arises that it is almost im¬ 
possible to compute the exact value of E a [k n ] in a large number of cases Con¬ 
sequently it was necessary to devise an experimental model to estimate these 
exact values so that the amount of error would be known within bounds. A 
set of 10,000 punched cards 1 was obtained on which were recorded 100,000 
random numbers drawn from a rectangular distribution. Thus if the cards are 
ordered on a particular set of columns, and one reads off the digits 0-9 on another 
specified column, one card at a time, it is equivalent to using a table of random 
numbers such as those prepared by Tippett [7], By the use of these cards, it 
was possible to run off on an IBM Tabulator any desired number of experiments 
in order to obtain an experimental distribution from which to calculate an es¬ 
timate of E,[k n ] and the variance of this estimate For example, in determining 
an estimate of Ej[2 5 ] one hundred experimental trials were made, as described 
above, with the following results: 


Number of Trials 
Required 

2 

3 

4 

5 

6 

1 These punched cards were prepared at 
direction of Doctor Joseph Berkson. 


Frequency 

23 

32 

31 

11 

3 , 

Mayo Clmic, Rochester, Minn , under tie 
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From this distribution the estimate of Fi[2 6 ] is 3.39, with a variance computed 
from the distribution of Oil The 95% symmetric confidence limits for the 
mean, computed from the Student f-distnbution, are 3 17 and 3 61 Such 
estimates will be used in the remainder of this section It should be pointed 
out that m order to prevent a prohibitive amount of machine time, it was 


TABLE 3 

Percentage Errors for 


b k 

n 3 

4 

5 

i i 

— 

— 

_ 

2 

+ .7 

+ 2.2 

- .3 +13.6 

5 

+ 1 1 

- 3.1 +5.7 

+ 6 +10.7 

10 



- 2.9 + 5.1 

2 1 

- 5.6 

- 4 

+ 1.3 

1 2 

- 4 6 

- 4 4 +4.4 

+ .6 +10 4 

5 

-4 6 +1.7 

+ 3.0 +9.3 

+ 7 9 +14.8 

10 

-3.7 +2.1 

- .3 +5.5 

+ 4.3 +10 7 

15 


+ 1.0 +7.2 


20 

-2.5 +2.4 



3 1 

-18.2 

-12.7 

- 3 1 

2 

- 6.3 

-16.5 -7.3 

- 2,9 +60 

5 

-9.7 -2.2 

-10.7 -5.5 

+ .8 + 5.8 

10 



- 2 1 +3.1 

4 1 


-12.0 

-15.6 

2 


-13.6 +6.1 

-11 6 - 3 9 

5 


-13.9 -7.2 

- 9.9 - 4.0 

10 


- 8 9 -2.6 

- 6.4 - 1.2 

5 1 



-8.8 

2 



-18.1 - 6.0 

5 



-12.5 - 5 6 

10 



- 8.9 - 2 9 


necessary to use many of the same runs to determine values of E,[k n ] for different 
values of s, k and n This means that the errors are correlated to some slight 
extent, but it would be extremely difficult to determine how much. 

A summary of the computed percentage errors for various values of s> k and 
ft is given in Table 3. In the instances where there are two entries, they are 
calculated on the basis of the 95% confidence limits for the experimental mean 
These confidence limits are symmetric and were determined by using the Student 
t-distribution. For k equal to 2 and 5 the distribution contained 100 trials, 
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while for k greater than 5, the distribution were made up of approximately 50 
trials. 

The computations given in this table show for various values of s, h and n 
the percentage error of the approximation for E.[k n ] In addition to showing 
the values of these percentage errors, the computations lead one to conjecture 
that 

1, For fixed s and k, there exists an no such that for n > n 0 the absolute value 
of the percentage error of the approximation for E,[k n ] is a monotone decreasing 
function for increasing n It was shown by McCarthy [5] that this absolute 
value approaches zero as n approaches infinity for K s [l”], and in fact, that the 
diffeience between the true and approximate values approaches zero 

2 For fixed s and n, there exists a ka such that for k > k a , the absolute value 
of the percentage error of the approximation for E„[k rl \ is a monotone decreasing 
function for increasing k. 

6. Computation. 

6.1. Curves to aid in the computation of E\[ki{pf), /c 2 (pa)] In 3.1 it was shown 
that Ei[ki(pi), /c 2 (p 2 )] is equal to 

~ I Pi {h + 1, fa) + ** I P1 (fa + 1, fa), 

Vi Pi 

where I*(p, q) is the Incomplete Beta-Function as tabled by Karl Pearson [6], 
There are three principal difficulties connected with the use of these tables as 
they apply to the approximations of this paper. These are: 

1. The tables must be available, 

2. The tables give directly only values for integer or half-integer values of 
ki and fa , and 

3 Since many different values of £i|7ci(pi), fa(pf)] are often required to obtain 
a single approximation, the computational burden would be very heavy. 

In order to surmount these difficulties, it seemed advisable to prepare curves 
giving the values of Ei[fa(pi), fa(pf)] for various values of fa , /c 2 , p% and p 2 . 
These curves would give values of E.\ ] with sufficient accuracy for most prob¬ 
lems not only for integer values of fa and fa , but for all values over the range 
considered. 

Such curves have been prepaied by computing Ei[fa(pf), fa(pf)] for integral 
values of fa and fa (for fixed pi and p 2 ) and then joining these points with a 
smooth curve. A summary of the graphs prepared is as follows: 



fa 


fa 


Pi 

Pi 

Fig 3 

1,2,- - 

25, w 

1,2,.-. 

, 35 

.50 

50 

Fig. 4 

1,2,-.. 

20, 

1, 2, • • • 

, 35 

.40 

60 

Fig. 5 

1, 2, • 

15, oo 

1j 2, • -• 

, 35 

.20 

.80 

Fig. 6 

1,2, 

10, 00 

1,2,..- 

, 15 

.80 

20 

Fig. 7 

1,2, 

7, CO 

1, 2, ... 

, 15 

60 

.40 

Fig. 8 

1,2, ... 

8, CO 

F2 ; . ■ 

, 15 

.50 

.50 

Fig. 9 

1, 2, • •• 

6, CO 

1,2,.-. 

, 15 

.40 

.60 

Fig. 10 

1, 2, • • • 

, 5, CO 

1, 2, • • • 

, 15 

.20 

.80 
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UK,(50), K (50) 


I22SSI 


muuuuw/ 


BSSSSSr'i» 


SBBBBS! 


SSBBBB 

■ihmim 


mil 


M i Mad MBH 


Figures 8, 9, and 10 are simply portions of figures 3, 4 and 5 drawn on an ex¬ 
panded scale in order to permit greater accuracy in reading the curves. Also 
figures 6 and 10 and figures 7 and 9 form pairs in that a member of one pair can 
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be obtained Iiom the other member of the pair Both members of the pair are 
given on the expanded scale in order to facilitate interpolation. Values of the 
mean for combinations of h and A 2 not given directly can usually be obtained 
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extremely poor. 
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As an example, suppose one has two boxes with k± = 2, k 2 = 5, = .40 an< 4 

p<t = .60, Consulting Fig. 9, one goes along the horizontal axis to k 2 ~ 5 . 



am h-m-h t±±±±±j3z£l±tlil±±±±±±±±±±±±±i hl±t l±t±±±±±±±j 

0 I 2 3 4 5 6 7 8 9 10 II 12 13 14 


Following up the vertical line through this point to the curve hi = 2, £i[2(.40), 
5(.60)] is read as 4 25. The actually computed value to four decimals is 4.2224. 
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0 1 2 3 4 5 6 7 8 9 10 II 12 13 14 


where ir* is the probability that either ki balls are obtained in the first box or h 
halls are obtained in the second box on the z\ th throw for the first time, assuming 
balls can go only in boxes one and two, takes on values 

fci » + 1, • ■' » k\ -fr kz — 1 

when hi < . Now it, can be easily computed and Eiix.ijh + p 2 ), faipdf 

can be obtained from the curves. The only difficulty in using this procedure 
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arises when the range of is large. Then a large amount of computation is 
involved. 

In order to illustrate this computation, consider f?i[2(.l), 3(.l), 5( 8)]. Here 
x, takes on the values 2, 3 and 4. We have x\ = 2, n = 2/8, xi = 3, tt 2 = 3/8; 
and x-i = 4, 7T3 = 3/8. From Fig. 6 

, Si[2(.2), 5(.8)] = 5.09 
Hi[3(.2), S( 8)] = 5.88 
Hi[4( 2), 5(.8)] = 6.11, 
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0 I 2 3 4 5 6 7 8 9 10 JI 12 13 14 


Consequently, jEi[ 2(.1), 3(.l), 5(.8)] is equal to 

(5.09) (2/8) + (5.88) (3/8) + (6.11) (3/S) = 5.77. 

Using computed values for Ei[x,{. 2), 5(.8)], 2?i[2( 1), 3(.l), 5(8)] is equal to 
5.75. Thus the use of the curves has only led to an error of 3%. 
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6.3. Use of the curves in approximating EAhipi), • ■ • , fc„(p„)], 

EJh(.Pi), ■ ■■ , kn(p n )] 


and E a [k n ]. In illustrating the application of the curves and the reduction 
formulae (4.34) and (4 35), one example will be worked through in detail. This 
example will provide illustrations of all the details involved in such problems. 
Consider EA 5 s ], Applying formula (4 34) 


(6 2 ) 


S 4 [5 6 J ~ 5/4 


EA 5 4 ] + E 



BJj 5 4 l V 

3 ) ’ 



Consequently, the first step must be to compute Ai[5 4 ]. Using the principles of 
4.4 

(6.3) Ei[5 4 ] c- U 1 [B 1 [5 2 ](.50), 5(.25), 5(.25)] 


From Fig. 8, ^i[5 2 ] = 7.55. Therefore TM5 4 ] is approximately equal to 


£?i[7.55( 50), 5(.25), 5(.25)]. 


Now applying the same principle again, 

(6.4) Ui[5 4 ] EAEA7 55(f), 5(*)](.7B), 5(.25)]. 

By the use of figures 7, 8, 9 and 10, graphical interpolation may be applied to 
find that 2Ji[7.55(f), 5(|)] is equal to 9.84 The approximation procedure now 
says that 

(6.5) £i[5 4 ] c* 2?i[9.84(.75), 5(.25)]. 


Again applying the curves and using graphical interpolation for pi and pi, 
EA 5 4 ] ~ 1188. 

Substituting this value m (6.2), 

(6 6) EA5 6 ] c* f {11 88 + U a [2.71, 2 71, 2.71, 2 03] j. 

Now formula (4.35) must be applied to EA2 71, 2.71, 2 71, 2.03], i.e. 


U 3 [2.71, 2.71, 2.71, 203] ^ 


(6.7) ■ 


jEi[(2.7J) 3 ] + Ei 


271 - 


i? 1 [(2.71) 3 



Ui[(2.7l) a ] 

3 



Fi[(2.71) 3 ] can be evaluated by the same method used for 2?i[5 4 ]. This leads to 
the result 


(6.8) F a [2.71. 2.71, 2 71, 2.03] <=* § {4.40 + £ 2 [1.86, 1.86, .56]}. 
Once more applying (4 35) 

U 2 [i.86, 1.86, 56] ^ 

(6.9) 


F1KI.86) 2 ] + £q[(2-1,86 - Si[(1.86) 2 ]), (.56 - 
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2?i[1.86j 1 . 86 ] is equal, by the curves, to 2.25. Therefore 

(6.10) E t [ 1.86, 1.86, .56] — f {2.25 + Ei[lA7, - .56]}. 

However, since the convention is observed that a negative quantity is replaced 
by zero, 

(6.11) #i[1.47, - .56] = Bx[1.47, 0] = 0. 

Now working back through these various expressions, 

(6.12) Eftf] —4 [ 11.88 + i [4.40 + i [2 25 + 0]]] = 27.81. 

From Table 2 it can be seen that the percentage errors for this approximation 
to Ei[5 6 \, corresponding to the 95% confidence limits for this quantity, are —4.0% 
and —9,9%. 

This example has illustrated most of the situations which will arise in the use 
of the approximations of this paper. 

6.4. Miscellaneous approximation formulae useful for computation. There 
exists a relatively simple approximation to .®i[fti(pi), kiipi)], pi + Pi = 1, when 
p 2 is near one. Using (3.8) and making some obvious simplifications, one ob¬ 
tains 


Ei[ki(pL), ki(p 2 )] — - 2 + 
Vi 


1 fa + h) ! 1 p 

Pi (h ~ 1) Kh - 1)! Pi Jo 


t H ~\i - - Pl ) dt. 


Since pi is near zero, (1 — t ) can be replaced by one, and the result is obtained 
that 


^ifc(pi), foipi)] ~ - 
Pi 


_1_ jq (hi + hi) 1 
Pi Vl (hi + 1) 1(1(2 - 1) 1 


An approximation to the Incomplete Beta-Function, given by Tukey and 
Scheff 6 [ 8 ], may also prove useful at times. The expression, changed slightly 
by those authors since publication, is 


Un - r + 1 ’ r) ~ 1 -wf)J 0 X “ (l) e ~' XV2) d * ’ 

where 


2 

x« 


2 r 


(1 _ 5) !L±J 

_ r 

■s/b 


1 


+ 2 r. 


The right hand side of the first expression will be recognized as the x distribu¬ 
tion with 2r degrees of freedom. In the event that the tables of x 2 are not ade¬ 
quate for the application of these expressions, the approximation of Wilson and 
Hilferty [10] should be used, This approximation states that (x 5 /'') 1 where 
v is the number of degrees of freedom, is approximately normally distributed 
with mean 1 — 2 /(9v) and variance 2/(9v), for large v. 
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THE DISTRIBUTION OF THE RANGE 1 

By E. J, Gumbel 
Brooklyn College, N. Y. 

1 . Summary. The asymptotic distribution of the range w for a large sample 
taken from an initial unlimited distribution possessing all moments is obtained 
by the convolution of the asymptotic distribution of the two extremes. Let a 
and u be the parameters of the distribution of the extremes for a symmetrical 
variate, and let R = a(w—2u) be the reduced range. Then its asymptotic 
probability F(E) and its asymptotic distribution \p(R) may be expressed by the 
Iiankel function of order one and zero. A table is given m the text. 

The asymptotic distribution g(iu) of the range proper is obtained from \p(R) 
by the usual linear transformation. The initial distribution and the sample 
size influence the position and the shape of the distribution of the range in the 
same way as they influence the distribution of the largest value. If we take the 
parameters from the calculated means and standard deviations, the asymptotic 
distribution of the range gives a good fit to the calculated distributions for normal 
samples from size 0 onward Consequently the distribution of the range for 
normal samples of any size larger than 6 may be obtained from the asymptotic 
distribution of the reduced range. 

The asymptotic probabilities and the asymptotic distributions of the mth 
range and of the range for asymmetrical distributions are obtained by the same 
method and lead to integrals which may be evaluated by numerical methods. 

2. Introduction. For any initial distribution, and any sample size n, the dis¬ 
tribution of the range may easily be written down in the form of an integral. 
However, for many given initial distributions the integration can be carried out— 
if at all—only for very small sample sizes, say n = 2 or n = 3 For larger 
samples, complicated numerical calculations have to be made, and there is no 
way of obtaining the distribution for n + 1 observations from the distribution 
for ?i observations. 

Our object is to obtain the asymptotic distribution of the range. Nothing is 
supposed to be known about the initial distribution, except that it is of the ex¬ 
ponential type [9] which assures that it is unlimited m both directions, and pos¬ 
sesses all moments. It will be shown that this condition is sufficient for the 
existence of an asymptotic distribution of the range 
With increasing samples sizes the distribution of the range may approach its 
asymptotic form in a quick, or in a slow way This behavior depends upon the 
nature of the initial distribution. Two examples for this approach will be 
shown. 


1 Research done with the support of a grant fiom the Social Science Research Council 
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3. The exact distribution of the range. Let <p(x) be any initial distribu¬ 
tion, <&(%) the probability of a value equal to, or less than, x. Then, for samples 
of size n, the joint distribution , *„) of the smallest value %i and the largest 
value x n is 

( 1 ) KlOi , x n ) = n(n - l)^) ($(x n ) - $(xi)) n ~V(a: n ). 

The distribution g„(i ii n ) of the range w n defined by 

(2) x„ = xi + w n 

is obtained by integrating over all values Xi ^ x n whence 

(3) g n (w n ) = n(n — 1 ) J ($(x + w„) — $(x)) n ~ 2 <p(x + w n )</>[x) dx, 

where the index 1 has been dropped The probability G n (w n ) for the range to 
be equal to, or less than, w n is obtained by integration of (3), whence, by re¬ 
versing the order of integration, 

/.+CO 

Gn(u>n) =n / (n — l)($(x -j- Wn) — $(z)) n ~ 2 d$(x + Wn) d$(x), 

J—oo Jo 

or, after integration, 

G n {w n ) - n f ($(x + w n ) — $(x)) n ~ l 

a formula to which Prof. H. Hotelling has drawn my attention The beauty of 
this formula is completely marred by the facts that, in general, we cannot express 
$(:c -f- w n ) by hf.r), and that the numeiical integration is lengthy and tiresome. 

The problem of the range for the normal distribution was first raised twenty 
five years ago by L. von Bortkiewicz [1,2] For n = 2 and n — 3 the distribu¬ 
tion of the normal range may be written down explicitly [12, 13] For larger 
normal samples up to n = 20, E S. Pearson [16] and H. 0. Hartley [10] have 
calculated numerical tables of the probability of the range. LHC. Tippett 
[ 20 ] has calculated the mean, the standard deviation, and the moment quotients 
for the range of the normal distribution up to n = 1000 He gave formulae for 
the moments in the form of integrals. Finally “Student” [18] reproduced the 
distribution of the range for small samples, n = 2 , 3, 4, 5, 6 , 10 , by Pearson’s 
type I, and gave a formula for large samples n = 20, 60, based on Pearson’s type 
VI, a procedure which is purely empirical and, therefore, unsatisfactory for 
theoretical purposes. A good resum5 of the present knowledge about the 
range is given m Karl Pearson’s Tables [17] 

All these studies are confined to the normal distribution and allow no conclu¬ 
sion about the asymptotic distribution of the range. According to Kendall [11] 
it is not known whether such forms exist and what they are. This question may 
at once be answered for a special case. If the distribution is limited to the left 
(or to the right), the asymptotic distribution of the range is equal to the asymp- 
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totic distribution of the largest (smallest) value. The asymptotic distribution 
of the range exists provided that an asymptotic distribution of the largest 
(smallest) value exists For the exponential distribution, and for initial dis¬ 
tributions of the Pareto type, for example, the asymptotic distribution of the 
range is equal to the asymptotic distribution of the largest value. The asymp¬ 
totic distribution of the range for the rectangular distribution has been derived 
by A. G. Carlton [3]. 

4. The asymptotic distribution of the reduced range for a symmetrical 
variate. Instead of the procedures mentioned in the last paragraph, let us 
consider a large sample It is generally assumed that the smallest and the 
largest values are independent in that case L. IT. C. Tippett [20] has shown 
that the correlation between the extremes is negligible for the normal distribution 
and for sample sizes n =2 200. In a previous note [9] it has been shown that 
independence holds for large samples and for initial distributions of the ex¬ 
ponential type unlimited in both directions and possessing all moments. Then 
the joint distribution (1) splits into the product of the asymptotic distribution 
of the smallest value x y and the asymptotic distribution f n {x n ) of the largest 
value x n 

(4) m(rci, x n ) = fi(xi) -U(x n ), 

If, furthermore, the initial distribution is symmetrical about zero, the two 
asymptotic distributions are 

(5) fi(X]) = a exploit -f- it) — e a(l i+«V|. y n ( Xn ) =Q; exp[- a(x n -u ) _ <f 

These asymptotic distributions and the corresponding probabilities are traced, 
in a reduced scale, on Graphs (I) and (2) 

Since the two parameters u and a will exist also in the asymptotic distribution 
of the range, their nature must briefly be explained. The value u is defined as 
the solution of 

( 6 ) Hu) = 1 - -. 

n 

Since 

(6') n(l - $(«)) = 1, 

the largest value u may be called the expected largest value. It differs, of course, 
from the mean of the largest value. It has been shown [6] that u increases as 
a function of the logarithm of n, the function depending upon the initial dis¬ 
tribution. 

Criteria for the approach of the distribution of the largest value toward its 
asymptotic form have been given by R, A. Fisher and L. H. C. Tippett [4]. , 
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For our purpose it is sufficient to consider whether n is so large that u is very 
near to the most probable largest value x n obtained from 


(7) 

If 


n — 1 

$(&.) 


<p(x n) = ~ 


< p'(x n ) 

^(-Tn) 


X n 


Pd U 


holds with sufficient approximation, 2 u may be interpreted as the range of the 
modes for an initial symmetrical distribution. 

The parameter a defined by 


( 8 ) 


= <p(u) 

~ I — 4>(w) 


also is a function of n. Three cases have to be distinguished: In the first case, a. 
is a constant, or converges with n toward a constant different from zero. In the 
second (and third) case, a increases with n without limit (decreases with n 
toward zero). The three cases correspond to three classes of initial distributions 
of the exponential type. The function a. is related to the asymptotic standard 
error of the largest, and of the smallest value by 

2 


(9) 


2 2 
a. 


2 2 IT 

= a or = — 
6 


If a increases (decreases) with n, or is independent of n, the standard error of 
the largest value decreases (increases) with the sample size, or is independent 
of it. This behavior has nothing to do with the fact that the standard error of 
the mean decreases, of course, with an increasing number of samples. 

The determination of the constants u and a from equations (6), (7), (8) is 
based on the knowledge of the initial distribution and the sample size n from 
which we take the largest observation. This method cannot be used in many 
practical applications: 1) It may happen that the initial distribution, or the 
parameters it contains, are unknown. Therefore the parameters of the largest 
value cannot be obtained from it 2) The initial distribution might be known, 
but the number of observations is insufficient to warrant this procedure, because 
the most probable largest value x n differs from the expected value u. In these 
cases the parameters u and a have to be estimated from the observed distribution 
of the largest value alone. A similar procedure will be used for the range in 
paragraph 7. 1 

From (4) and (5) the joint asymptotic distribution to(ai, w) of the smallest 
value Xi and the range w becomes 

n>(a*,w) = a a exp[—a(iu - 2«) - e a ^ - e***-*). 

The asymptotic distribution g{w) of the range alone is, dropping the index 1, 

(40 g(w) = f + °°exp[-e a(x+u> dx . 

J—a o 
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This distribution contains the two parameters a and u existing in the asymptotic 
distribution of the largest value To eliminate the two parameters, a reduced 
range R is introduced by 

(10) R — a(w — 2 u). 

The range w is a positive variate unlimited toward the right. The reduced 
range R is also unlimited toward the right yet limited toward the left by 

(10') R S -2 otu. 

The reduced range is not related to one of the averages of the range. It is the 
ra'nge minus the range of the modes divided by a factor which is proportional to 
the standard error of the extreme value. The distribution *p(R) of the reduced 
range R, and the distribution g(w) of the range w are related by 

(11) = -£>(w), 

a 

subject to restriction (10')) whereas the probability $r(R) of the reduced range to 
be equal to, or less than R is equal to the corresponding expression G(w) for the 
range proper 

(11') ¥(B) = Cr(w). 

For the integration m (4') we put 

«(rr: -f- w — li) — —y 


whence, from (10), 

a(x + u) = —y — R. 

The asymptotic distribution of the reduced range becomes 

(12) tl/(R) = e~ n f exp[-e 1 ' — c~ v ~ R \ dy 
and the asymptotic probability 'Jr(R) of the range is 

(13) <5r(R) = f exp [y - e v — e~ v ~ R \ dy 

J — aO 


an expression which may easily be verified by differentiation. 

The asymptotic formulas (12) and (13) hold for any initial symmetrical dis¬ 
tribution of the exponential type, for example, for the normal and the logistic 
distribution (see par 7). The mean reduced range R and the higher moments 
of the reduced range are easily obtained from the mean w, the variance , and 
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the invariants A,, of order v of the range proper w given m a previous paper [8]. 


They are 



(14) 

w = 2u + ‘-L ; cl 

2 

__ 7 T 


a 

3 a 2 

(15) 

\ _ 2(v — 1)! ^ 1 

a' £1 if ’ 

v ^ 2 


where 7 stands for Euler’s constant. 

Consequently the mean R, the variance <r% and the invariants K of the reduced 
range are 

(16) R — 2y; c% = ; A, = 2{v - 1)! £ I ; v^2 

o 1=1 k 

Equation (14) leads to an interpretation of the reduction (10) which may be 
written 


R = a{w — w) + 2y 


or 


(140 


R = 


\/3 <r w 


Thus the transformation (10) is a linear function of the standard transformation 
(w — w)/<r w usual in statistics 


6. The probability of the range as a Bessel function. The integrals 
(12) and (13) may be evaluated by numerical procedures, since tables of the 
function exp(~e~ v ) are easily calculated However, it turned out to be simpler 
to relate these integrals to the solution of a differential equation. The deriv¬ 
ative ^'(R) of the distribution (12) is 


/»- 4-03 

xP'(R) = -1 p(R) -f e~ R I exp[—y — R — 


G — G 


—y—R 


\dy 


The integral is equal to the probability ¥(R) since the transformation 

y + R = —z 

leads to 


/.+60 p TOO 

1 exp[—y — R — e v — e~ v ~ R ] dy = / exp[z — e~*~ R — e'j dz 
J—oc J— 00 

Consequently the probability T(E) is subject to the differential equation 

- e~ R V = 0 . 


(•+» 


(17) 
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The mode of the reduced range is a fixed value’hTsuch that 

(18) = e~MR). 

Mr. W Wasow (Swarthmore College) has drawn my attention to the fact that 
the probability ^(-ff) of the range can be expressed in terms of a Bessel function . 2 
To obtain tlus simplification of the differential equation we introduce a new 
positive variable 2 by 

(19) 2 = 2e~ nl2 
and a new function U by 

(20) 'I' = U-z. 

The boundary conditions are 

( 21 ) 2 = 0 , ^ = 1 ; 2=co; '£ = 0 . 

The first derivative becomes, from (19) 

dSr _ _ 2 d<L' 
dR 2 dz 

whence, from ( 20 ) 



The second derivative becomes, by the same procedure 

#9 = _ zd (_zU _ £d_U\ 
dR 2 2 dz \ 2 2 ds / ‘ 

The second member may be written 

2 \2 + 2 dz + 2 dz 2 ) 4 + 4 dz + 4 dz 2 ‘ 

Thus the differential equation (17) is now 

+ *£*u-idjr zu_zu_z^ 

4 da 2 “ 4 ds 2 dz + 4 2 4 U 

Multiplication by 42“ 1 leads to 

< 21 ') + -<*’ + » u ~°- 

This is one of the classical Bessel differential equations of order 1. In the nota¬ 
tion used by the British Tables [14] (pp. 264 and 213) one of the solutions is 

(22) 1/(2) = Kx(z), 

2 I profit of this occasion to thank him for this and other valuable suggestions. 
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<* 

where Kite), the modified Bessel function of the second kind (Hankelfunction) 
is defined by 


(23) 


Ki(s) = (7 - lg 2 + lg z) 


00 1 

y —-— 

0 v ! (v -4~ 1) I 



+ - _ v— - - 

T s 1 (y - 1)' y! \2y 


^ + 2 + 


+ 


- - -V 

v 2 v) 


The relation between the functions K„te) and the Hankelfunction is 


(23a) K„(z) = - i +1 H[ l \iz). 

\ 

The asymptotic probability for the range is, from (20) and (22), 

(24) T(.R) = zKj(z') 
or, from (19) 

(25) *(K) = 2e~ m K 1 (2e- Rn ). 

This is the only Bessel function satisfying the boundary conditions (21). The 
asymptotic probability 4 r(R ) of the range may be written finally from (25), (23) 
and (10) 

(25a) 1 - *<B) - t ^4^7, (R - 2y + 2 S, - 0 

where 

So = 0; & = 

X-l A 


The distnbution 


MR) = 


<&{R) 

dz 


dz 

dR 


of the reduced range R is, from (24) and (19) 


MR) = -|(Ki te) + zK[{z)). 

Now, the derivative K[iz) is linked to the modified Bessel function Note) of 
the second kind and of order zero by 

zK[{z) = -K^z) - sKo(z). 

Consequently the distribution is 

(26) MR) = I *ote) 



392 


E. J. GUMBEL 


or, from (19), 

(27) \p(R) = 2e' x Kn(2e~ s /2 ) 


where the function K a (z) is defined by 


(28) JC 0 ( 2 )=-( t - l & 2+i gg )E(| 


1 

v! v! 



, i+ 2 + 



Finally the asymptotic distribution i p(R) of the reduced range may be written 
from (27) and (28) 


"(28a) HR) = £ g xp LJ> + MJ (B - 27 + 2ft) 

0 yl vl 


We first investigate the analytic behavior and the order of magnitude of the 
probability 'b(R) and the distribution for large negative, and large positive 
values of the reduced range, i.e, for large and small values of the positive variable 
z. If 2 is so large that 


(29) 


2“ 3 = 


fl (3K/2) 

~ 8 ~ 


« 1 


the expressions for K x (z) and K 0 (z) become [14], p. 271, 


Ifo(z) = 


G ' 

22 


i'-h 


+ 


128s' 


The probability T(l?) becomes, from (24) and (19), 


(250 


^(ft) = \Ar exp 


- 2e~ {m) 


l + r Q ° m - 



The condition (29) holds, say, for R = —4. The numerical calculation leads, 
for ']>(—4), to the order of magnitude 1(T\ 

In the same way, the distribution p(R) becomes, from (26) and (19), for large 
negative reduced ranges 


(270 


P(R) = vV exp 



e Rn , 9 „\ 

lQ + 512 e J' 


This expression cannot be obtained from (250 since the approximations for 
-Ko(z) and K x {z) used do not fulfill the relations between the derivatives given 
above. The order of magnitude of W( — 4) is 10 -6 . 

Thus the probability T(if) and the distribution ^{R) may be neglected for 
R ^ — 4. This removes the importance of the lower limit R Si —2 au stated 
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in (10'). If au S 2 , the distribution of the range may be dealt with as if it 
were practically unlimited toward the left. 

For large positive reduced ranges to which correspond small values of z, say 

(290 3 s = 8<T !3k/2) « 1 


the Bessel functions Ki(z) and Kq{z) become, from (23) and (28) 

<23') «*>-(*+114) (i+£)+Hi + §D 

(280 ff.»--( T + !gi)( 1+ ^+£ + {*. 

In this case we are interested to know how far the probability 'I'(7?) differs from 
unity. Consequently we calculate 1 — 'i(R) and obtain, from (24) and (230 

i-*®-4[G +i *i)( 1 + §’H-S ! } 

The right side becomes, from (19) 

= [(R ~ 27) (l + + 1 + I*'*] 


or 

e 


- 2 7 + l +^(b- 2 y + |)]- e-‘(R - 2 7 + l)(l +^) + 


4 ‘ 


If R is so large that 


«1 


we simply have 

(25'0 1 - *(7?) = e~*(R - 2y + 1) . 

For example, for R = 10, the preceding condition is satisfied and 1 — 1/(7?) is 
of the order 5.10~ 4 

In the same maimer we calculate the density of probability 4>(R) for large 
reduced ranges. From (26), (19) and (280 we obtain 

m = 2 + R [(f - 7 ) (1 + e" B ) + e-* + ?<T 2K . 

By neglecting e -2n <5C R, the right side becomes 

e~ R [(R *- 27 ) (1 + e~ B ) + 2e~ a ] = e“ ft [7? - 27 + <f R {R - 27 + 2 )] 
whence 


4>{R) = e~ B (R - 2y)(l + e~ B ) + 2 <f 2B . 
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In first approximation we obtain 

(27") VK R) = e-*{R - 27) 

a formula which, may also be derived directly from (25"). The density of 
probability is of the order 10 -4 for R = 10, 

From the formulae (25') and (27') valid for large negative values of R, and 
from the formulae (25") and (27") valid for large positive values of R follow the 
boundary conditions 

HR) -Mi) v HR) R ~ 2y 


lim rrm = e 
s—«, *(#) 


lim 


’ 1 - ¥(£) R - 2y + 1 

For the construction of tables of the distribution ^(/f) and the probability 
'f'(Zt) of the reduced range it is sufficient to consider the interval 

-3 < R < 10. 


The two functions Ivi(z) and Kq(z) have been tabulated [14] and [19]. Hence 
the probability and the distribution could be calculated from such tables of the 
Bessel functions. This procedure, however, was only used to obtain boundary 
values The tables I and la are based on computations made in the Calculation 
and Ballistics Department at the Naval Proving Ground Dahlgren by stepwise 
integration of the differential equation (17) using the special Relay Calculator 
of the International Business Machines Corporation. 3 

Table I gives the probability \h (R) (col. 2) and the distribution \p(R) (col. 4) 
for the reduced ranges — 3 2S R g 10.5 m intervals A R = 0.5. The differences 
Air given in col. 3 are taken from the original figures. 

For different uses it is necessary to know the reduced range as a function of 
its probability. This relation is shown in Table la. The first column gives the 
probability, the first line gives the last decimal of this probability, and the cells 
give the reduced range corresponding to the probability obtained from the 
combination of the first column and the first line. For example. The reduced 
range R = —3.20 corresponds to the probability 'i'(R) = 0.0002, and the reduced 
range R ~ 10.44 corresponds to the probability ’Z'(R) = 0.9997. 

This table may be used for obtaining the percentage points of the reduced 
range. The mode R, the median R calculated by the Naval Proving Ground 
and the mean R obtained from (14) and (10) are 

(30) R = 0.506366440; R = 0.928597642; R = 1.154431330 

A probability paper for the range may be constructed in the following way: The 
observed ranges w are plotted on the vertical axis; the reduced ranges R on a 
horizontal axis. The abscissa shows the probabilities 

¥(B) = G(w) 

’The author wishes, to express his sincere appreciation for the permission to use these 
computations The original tables give the probability and the distribution to 8 significant 
decimal places at intervals A12 = 1/100 Lack of Bpace prevents the reproduction of these 
tables. 
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TABLE I 


Asymptotic Probability and Asymptotic Distribution of the Reduced Range 


1 

2 

3 

4 

Reduced Range 

R 

Probability 

v (R) 

Difference 

AW 

Distribution. 

* (R) 

-3.0 

.00050 

.00274 

.00212 

-2.5 

.00324 

.01032 

.01057 

-2.0 

.01356 

.02693 

.03386 

-1.5 

.04048 

.05251 

07705 

-1.0 

.09299 

.08141 

.13419 

- .5 

17440 

.10533 

.18969 

.0 

.27973 

.11821 

.22779 

.5 

.39794 

.11859 

.24075 

1.0 

.51654 

.10891 

.23021 

1.5 

.62545 

.09327 

.20346 

2.0 

.71872 

.07557 

.16898 

2.5 

.79429 

.05860 

.13360 

3.0 

.85289 

.04386 

.10157 

3.5 

.89675 

.03192 

.07483 

4.0 

.92867 

.02270 

.05375 

4.5 

1 

.95136 

.01584 

.03783 

5.0 

.96721 

.01089 

.02618 

5.5 

.97810 

.00739 

.01787 

6.0 

.98549 

.00496 

.01205 
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TABLE I —Concluded 


1 

2 

3 

4 

Reduced Range 

Probability 

Difference 

Distribution 

R 

V (R) 

A'P 

* (R) 

6.5 

.99045 

.00330 

.00805 

7.0 

. .99375 

.00218 

00534 

7.5 

.99594 

.00143 

.00351 

8.0 

.99737 

00093 

00230 

8.5 

.99830 

.00061 

.00150 

9 0 

.99891 

.00039 

.00097 

9.5 

.99930 

.00025 

.00062 

10.0 

.99955 

.00016 

.00040 

10.5 

.99972 


.00026 


corresponding to the reduced ranges R. If the observations follow the theory, 
the observed ranges are scattered around the straight line 

(10') w = 2u + - 

a 


If the samples are drawn simultaneously, and if there is a constant interval of 
time between the drawings, this interval may be used as unit of time for the 
construction of the return periods 'J'(R) and iT(R) of a range equal to, or larger 
than (smaller than) R where 


T(R) = 


1 

1 - V(R) 


i T(R) 


1 

'I'(E) 


The first (second) notion applies to the range above (below) the median. The 
return periods are shown in an upper parallel to the abscissa. 

A scheme for this paper is given in Pig 3 Such a paper will allow a graphical 
test for the fit of the observed ranges to our theory, and avoids any numerical 
calculations. Obviously this method may only be used if the initial distribution 
is symmetrical, unlimited, and of the exponential type, and if the sample size 
is so large that the asymptotic distribution holds. 
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6 . The range, the midrange, and the extremes. The asymptotic dis¬ 
tribution (27) of the reduced range was obtained by convolution of the asympto¬ 
tic distributions (5) of the extremes The same method leads to the asymptotic 
distribution of the reduced midrange [8] 

(31) v = a(a;i + x n ). 

TABLE IA 


The Reduced Range R as Function of Its Probability (7t) 


* (R) 

0 

1 

2 

3 

4 

S 

6 

7 

.8 

9 


— 


-3.20 

-3 12 

-3.05 

m 

-2 96 

-2 92 

-2.89 

-2.86 


— 

-2.83 

-2.64 

-2.52 

-2.43 

-2.36 


-2 25 


-2.16 

0 

— 

-2,12 

-1 84 

— 1.65 

-1.51 

-1.39 

-1 28 

-1 19 

8 mh 

-1.02 

.1 

||J|1 

11111 

MB 

RH1 

gll 

-0.63 

Ml 

MR 

E95I 

-0 42 

.2 




jS 

Rfffj 

-0.13 


HSiiil 


0.04 

.3 


■BlI 

1 

1 

jH||l 





0 47 

.4 

1 




1 

0.72 




0.89 

.5 

0.93 

0.97 

1.02 



1.15 


1 24 

1.28 

1.33 

.6 

1.38 

1.43 

1 47 

1.52 

1.57 

1 62 

1.67 

1.73 


1 84 

.7 

1.89 

1.95 

2 01 


2.13 

2.19 

2 26 

2 33 


2.47 

.8 

2.54 

2.62 

2.70 

2.79 

2 88 

2.97 

3 07 

3.18 


3.41 

.9 

3.54 

3.69 

3.85 


4.23 


4 75 

5 11 

5.61 

6.45 

99 

6 45 

6.57 

6.71 

6.87 


7.26 

7.52 

7.85 

8 31 

9.10 

999 

9.10 

9.22 

9.35 


9.67 

9.88 

^|||g 


* 

* 


* These values have not been calculated. 


On the other hand, the asymptotic distiibutions of the reduced extremes are 
obtained by introducing the transformations 

(32) 2/1 = afai + u); y n = a(x n - u ) 

into formulas (5). It is interesting to compare these four distributions and four 
probabilities with each other. This is done in Figures 1 and 2. The probability 
and the distribution of the midrange are practically identical with the probability 
and distribution of the smallest value, for small values of the midrange, and 
become practically identical with the probability and distribution of the largest 
value for large values of the midrange. Fig. 2 shows that the asymptotic dis¬ 
tribution of the reduced range is less asymmetrical than the asymptotic distribu¬ 
tions of the reduced extremes. 
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Table II contains some characteristic values for these four asymptotic dis¬ 
tributions. The first three columns are obtained from previous publications 
[6, 8], The mean range is equal to the range of the means for the extremes 
The median of the range is larger than the range from the median of the largest 
to the median of the smallest value. The mode of the range is slightly smaller 
than the mean of the largest value. These statements hold, of course, onlyjor 
the reduced variates. 



-5 -<» -4 -2 o 2 4 & & to 

Re.DOCE.C7 S/a ATE- 

Fig 2 


From the mode R of the reduced range given m equation (30) and the trans¬ 
formation (10), the mode w of the range itself is obtained as 

- o , # 
w = 2u -1— 
a 

whereas the difference of the modes of the largest and of the smallest values is 


x n — = 2 u. 



' (33) 





tt(r) Re-rufvA Pe.b.ioo T (R) 
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For a symmetrical initial distribution of the exponential type the mode of the 
range converges toward the range of the modes of the smallest and of the largest 
value, provided that the parameter a increases without limit with the sample 
size. Thus this convergence does not hold for all symmetrical distributions. 

The last two lines m Table II give the four probabilities corresponding to the 
intervals from the mean p minus once (twice) the standard deviation <r, up to 
the mean plus once (twice) the standard deviation The first probability for 

TABLE II 


Characteristics for the 4 Asymptotic Reduced Distributions 


1 

Characteristic 

2 

Largest Value 

3 

Smallest Value 

4 

Midrange 

5 

Range 

Mode 

0 

0 

0 

.506 

Expectation 

y = .57722 

= -.57722 

0 

2y = 1,15444 

Median 

—Iglg2 = .36651 

= - 36651 

0 

929 

Seminvariant char 
function 

r(i - t) 

r(i + t) 

r(i - f)T(i + <) 

r 2 (i - t) 

Variance 

~ = 1.64493 

0 1 

= 1 64493 

7r 2 

3 

= 3 28986 

First + second mo- 

ft - 1 29857 


0 

64928 

ment quotient 

ft = 5 4 

BUN 

4.2 

4 2 

95% Probability 

2 97 

1 10 

2 94 

4 46 

99% Probability 

4 60 

1 53 

4.60 

6 45 

F(fl + <r) — F{p — cr) 

72 

72 

72 

.71 

F (p -j- 2c) — Fiji — 2c) 

.90 

.90 

95 

95 


the four distributions is about the same as for the normal distribution. The 
second probability for the range and the midiange is about the same as for 
the normal one 

7. The asymptotic distribution of the range for a symmetrical variate. 

The as 3 unptotic distribution of the range R is, of course, independent of 
the sample size, and parameter-free. Both statements do not hold for the 
distribution g(w) of the range proper which is, from (11) 

(34) ' g(w) = a4>[a(w — 2«)]. * 

In this formula, the range is expressed in the same units as the initial variate. 
The parameters a and u are functions of the sample size n, the function depending 
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upon the initial distribution. From equations (6), (8), (14) follows that an 
increase of tlie sample size has two influences on the distribution of the range. 
The increase of the parameter u shifts the distribution toward the right without 
changing its form, whereas the parameter a influences the shape of the distribu¬ 
tion, If a increases (decreases) with n, the distribution of the range shrinks 
(spreads) with increasing sample size. If ol is independent of n, an increase of 
the sample size does not change the shape of the distribution Only m the first 
case may we increase the precision of the range by increasing the sample size. 
The two parameters thus influence the range m the same way as they influence 
the extreme values. 

To use equation (34) for a given initial distribution and a given sample Rize, 
we have to determine the expected largest value u and the parameter a as func¬ 
tions of n We may use the definitions (6), (7), (8) if the initial distribution is 
known and of the exponential type, and if the sample size is so large that the 
most probable largest value is sufficiently near to the solution of (7). 

As a first > example, consider the so-called logistic distribution. This prob¬ 
ability is 


(35) 

The initial distribution is 


#(*) = a + c~r 


<p(x) = <E>(m) (1 — $(x)) 


(35') 

and the derivative is 

(35") <p'(a :) = *(a)(l - #(*))(1 - 2fc(®)). 

Equation (6) becomes 


1 + e~ u = 


whence the expected largest value 

(36) u = lg(n - 1). 

The most probable largest value x n for n observations is obtained from (7). 
This equation becomes, from equation (35) 

{n — 1)(1 - $($»)) = — 1 + 2$(x») 
n 


whence 


$($„) = 


n -f 1 

Equation (35) leads to the most probable largest value 
(360 ~ Ign. 

Even for n as small as 30 the difference between x„ and u is less than 1%. Con¬ 
sequently the asymptotic form of the distribution of the range may be used even 
for small samples. The two parameters are 


(37) 


u = lgn; 


n -f 1 
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Since a converges toward unity, an increase of the sample size shifts the distribu¬ 
tion of the range toward the right without influencing its shape: the precision of 
any estimate made from the range cannot be increased by increasing the sample 
size. 

The characteristic ranges introduced m paragraph 5 are obtained immediately: 
the mean w, the mode w, the median range w and the ranges w 6 b and w 99 

w = lgn + 1 154, w = lgn + 50S, 

w = Ig n + .929; w 05 = lg n + 4.46; w, g9 — Ig n + 6.45 

are parallel straight lines if traced as functions of the sample size n on seiru- 
logarithmic paper. 

For the normal distribution we cannot expect such simple results. Here, u 
and a can only be calculated as numerical functions of n although limiting forms 
of these functions are known. The parameter a increases with n, and the 
standard error of the range decreases without limit although very slowly. The 
logistic distribution belongs to the first, the normal distribution to the second 
class of initial distributions of the exponential type. 

The probabilities and the distributions of the range for normal samples of 
size 5, 10, and 20 as calculated by E S. Pearson and H. 0. Hartley [16] are 
traced in Figures 4 and 5. Our aim is to trace the corresponding asymptotic 
probabilities and distributions in order to see how far the asymptotic ranges 
differ from the exact ones. However, we have first to settle the preliminary 
question how far the most probable largest value x„ differs from the expected 
largest value u. The most probable largest value x n is obtained from (7) which 
becomes, for the normal distribution, 

(38) x„^(x n ) = (n — l)<p(x n ). 

The results x n as functions of n are shown in Table III cols. 1 and 2. The 
expected values u obtained from (6) are given m col. 3 For small samples, the 
two values 3 n and u differ widely, as might be expected We are inclined to 
conclude that the asymptotic distribution of the range cannot hold for small 
samples. However, the only legitimate conclusion to be drawn is, that we can¬ 
not calculate the two parameters in the way stated before (6) and (8). Instead, 
we estimate them directly from the observations. The question of the most effi¬ 
cient estimates of these parameters is not yet solved. The simplest vray is to 
use the mean range w n and the standard deviation of tlje range u u ,, 7 , as given by 
Tippett [20] and Pearson [15]. To distinguish these estimates from the asympto¬ 
tic values, we write the estimates with an index n From (14) we obtain 

(39) — = a Win ; 2u n = w n — — . 

a n 7r a n 

Table III gives the calculated means w n and standard deviations <j w , n of the 
range, and the estimates l/a n and 2u„ . Fig. 6 shows hoiv the most probable 
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largest values x n approach the expected largest value u with increasing sample 
size, The estimate u n quickly approaches u. Besides we trace the mean range 
ffi n , the standard error of the range <r* l0 , and 1 /a H which is proportional to it. 



From col, 8 follows that the condition aw 3? 2 is fulfilled from n ^ 6 onward. 
The ranges obtained from the transformations 


(40) 


. , R 
w = 2 u n + — 
a n 


are given in Table IV, cols. 3-7. The asymptotic probabilities of the range as 
obtained from the combination of columns 3-7, and col 2 of Table IV are traced 
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in Fig. 4 as separated points The asymptotic probabilities are situated very 
near to the exact ones Therefore the same method was used to calculate the 
asymptotic probabilities of the range for n — 50 and n = 100 which have not 
been calculated by Pearson. They too are traced in Fig. 4. 



The asymptotic probabilities of the range hold even for small normal samples. 
However, the parameters obtained from the exact distribution differ considerably 
from their asymptotic values In other words: The asymptotic probabilities of the 
range hold even for small normal samples provided that the parameters are taken 
from the observations. 

To compare the asymptotic distributions of the normal range to the calculated 
distributions, we attribute the asymptotic differences AT/a„ for a unit interval 
Aw = 1 to the middle of the corresponding intervals. The results are traced in 
Fig. 5 for n = 5, 10, 20, 50, 100. On the other hand, we take the differences 
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AT for unit intervals from Pearson’s tables, and trace them in the same graph. 
The fit of the calculated to the asymptotic values may be considered satisfactory, 

TABLE III 


Estimate of Parameters from the Calculated Distributions 
of the Normal Range 


1 

2 

3 

4 

5 

6 

7 

8 

Sample 

Largest Value 

Mean Range 

Standard 

Estimated parameters 

Lower 

Modal 

Expected 

u 

deviation 



limit 




1/an 

2u 0 

2a 0 u„ 

3 

.765 

.431 


.8884 

.4898 

1.128 

2.30 

4 

.938 

.674 


.8798 

.4851 

1.499 

3.09 

5 


.842 

2.326 

.8,641 

.4764 

1.776 

3.73 

10 

1.419 

1.282 

3.078 


439 

2.571 

5.86 

20 


1.645 

3.735 

.729 


3 271 

8 14 

50 

2.126 


4.498 

.653 



11 34 

100 

2.377 

2 326 



.334 

4.630 

13.86 


TABLE IV 


Asymptotic Probabilities for Normal Ranges Taken from Small Samples 


1 

2 

3 

4 

5 

6 

7 

Reduced 

range 

R 

Probability 
G(w) = *(R) 

Normal ranges w 

= 2u u + R/a D for sample sizes 

n = 5 

n = 10 

n - 20 

a - 50 

n = 100 

-3 

.000 

.35 

1.25 

2.07 

3.00 

3.62 

-2 

.014 

.82 

1.69 

2.47 

3.36 

3.96 

-1 

.093 

1.30 

2.13 

2.87 

3.72 

4 30 

0 

.280 

1.78 

2.57 

3.27 

4.08 

4.63 

1 

.517 

2,52 

3 01 

3.67 

4.44 

4.96 

2 

.719 

2 73 

3.45 

4.07 

4.80 

5.30 

3 

.853 

3,21 

3,89 

4.48 

5.16 

5.63 

4 

.929 

3.68 

4.33 

4.88 

5.52 

5.97 

5 

.967 

4.16 

4,77 

5.28 

5.88 

6.30 

6 

.985 

4.63 

5.20 

5.68 

6.24 

6.63 

7 

.994 

5.11 

5.64 

6 09 

6.60 

6.97 


Fig. 5 shows furthermore how the distributions of the range are shifted toward 
the right and become more concentrated for increasing sample sizes. 

As an example for the practical application of the asymptotic distribution of 
the range, we use an observed distribution of 50 ranges taken from samples of 




















3 - ? 1 J « «■ ? 
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n = 14 normal values given in Freeman’s book [5] p. 128 The observed step 
function is traced in Fig. 7. For reasons given in a previous article [7] we 
attribute the cumulative frequency .5 to the smallest range 3, and the cumulative 
frequency 49,5 to the lnigest range 18. To compare this step function with the 



4 B 1Z. \6> ZO 

Fig. 7 


probability G(w), we estimate the two parameters u„ and a n from formula (39). 
The mean range w n and the estimate s w ,„ of the standard deviation of the ranges 
are 

w = 10.68; s w ,„ = 2.93. 

Consequently we obtain, from (39) 

- = 1.61; 2u n = 8.82. 
a n 
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The theoretical ranges are thus, from (40), 

w = 8.82 + 1.61 R. 

The corresponding probabilities G(w) taken from Table I are traced in Fig. 7. 
The fit of the theory to the observations is certainly satisfactory, especially if 
we take into account that the ranges are given in integer numbers only. 

8. The mth range and the asymmetrical case. An obvious generalization 
of the theory as established in paragraph 4 consists m the construction of the 
asymptotic distribution of the mth range for an unlimited symmetrical distribu¬ 
tion of the exponential type. The mth range is the positive distance from the 
mth observation from above, x m , to the mth observation from below, m x We 
suppose m to be very small compared to the sample size Under the conditions 
stated in the beginning, the joint distribution x m ) of the mth extreme 

values splits into the product of the asymptotic distribution of the mth extreme 
value from above, by the asymptotic distribution of the mth extreme 

value from below, Here, [6] 

fm(x m ) = oc m exp [~ma m (x n — vj - rne~ Um{Xm ~' im) ] 

mjLx) = a m exp [rna m ( m x + v„) - me“ mt "‘ I+ “ m) ] 

The sample size must be so large that the most probable mth extreme value x m 
is sufficiently near to u m which is defined as the solution of 


*(0 = 1 - -. 

n 


The factor a m defined by 


= 1 - 


is related to the asymptotic standard error a m of the mth extreme value by 


Oim &7n 




The joint asymptotic distribution tu(»x, of the mth smallest value and the 
mth range 

(41) 


Wm — m% 


IS 


toU wj = cL exp [- rna n ( Wm - 2uJ - me"" u ‘ + “" ) - . 

The asymptotic distribution g(w m ) of the mth range is, dropping the index m of 
the variable m x, 


g (uyj — C 




exp [- 
J— to 


me 


me 




] da;. 


,».(*+»») __ 
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Again we introduce a reduced range R m defined by 
(42) 2Wj n) Rm 2 

and put for the integration 

a m (X + Uni) = y. 

Then the asymptotic distribution \p(R,„) of the reduced with range is 

/ +« 

exp[— me" — me~ u ~ Rm J dy. 

■ oO 


The probability St (R m ) for the with range 


/ ■Em 


dz 


cannot be reduced to a single integral This is duo to the fact that the proba¬ 
bilities of the with extreme values cannot be written down except in the integral 
form [6]. No differential equation similar to (17) exists. However, the function 
(43) could be calculated by numerical methods. The mean R m , the generating 
function and the moments of the with range have been given in a previous 
paper [8], 

For sake of completeness, consider finally an unlimited asymmetrical initial 
distribution of the exponential type In this case, the joint distribution of the 
smallest and of the largest value splits again, for large samples, into the product 
of the asymptotic distributions fiixi) and /„( x n ) of the smallest and of the largest 
values which are now [6] 

fi(xi) = aiexpMzi - «i) - c" l(ll “" l) ]; 

fn(x„) = a„exp[—a„Ge n — u„) - “»>], 


Here, a„ and u n are defined, as previously, by (6) and (8). The sample must 
be so large that the most probable smallest value x, is sufficiently near to the 
solution of 


<f>(«i) = - . 
n 


The factor ai defined by 


Oil = 




is related to the asymptotic standard error of the smallest value by 


ai<ri “ V6' 

The joint asymptotic distribution of the smallest value Xi and the range W 
, w ) = aja n exp[ai(a?i — Ui) — a n (x! + w — u n ) — e“ l( *i-“i> — e _ “ n( * 1+ “ 
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contains four parameters instead of the two which exist in the symmetrical case. 
However, the number of parameters may be reduced to one. We introduce a 
reduced range R defined by 

(44) R = a n {w — u„ + mi) 

being the range itself minus the range of the modes divided by a factor pro¬ 
portional to the standard error of the largest value. If we put 

(45) ol^Xi - Mi) = y; — = /S 

<*i 

the distribution ip(R) of the reduced range becomes, in the asymmetrical case, 

(46) \p(R) = e~ n f exp[i/(l — j3) — e v — e~ fiy ~ R \ dy 

J —oO 

and the probability SF(fi!) for the reduced range is 

(47) 4>(ffi) = [ exply — e v — e~^ u ~ K ] dy 

J—oo 

a formula which may immediately be verified by differentiation with respect to 
R. The mode R of the range is the solution of 

i p(R) = e~ K f exp[j/(l — 2/3) — R — e y — e~ f>v ~ R \ dy 

J— oo 

Contrary to the symmetrical case, the latter integral cannot be expressed by the 
probability, and no simple differential equation similar to (17) exists The ex¬ 
pressions (46) and (47) contain a single constant measuring the asymmetry of 
the initial distribution In the symmetrical case, 0 = 1, we obtain, of course, the 
previous formulas (12) and (13). In the asymmetrical case, the mean, the 
variance, and the higher moments of the mth range may be derived from the 
generating function given in a previous paper [8]. 

The asymptotic distribution of the mth range in the asymmetrical case can 
easily be obtained by combining the two procedures used in this paragraph. 
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Addition at proof reading: 

G Elfvmg’s article “The asymptotical distribution of range in samples from a normal 
population”, Biometnka, Vol 36 (1947), appeared when this manuscript wrs ready for 
print. Elfvmg considers a probability transformation of the range whereas we deal with 
the range itself. His distribution requires the knowledge of the initial distribution and 
of the sample size, whereas this knowledge is not required in our asymptotic formula 
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1. Summary. The means, variances, and covariances for samples of size 
< 10 from the normal distribution, a selected long-tailed distribution, and the 
uniform distribution are tabled and compared with the usual asymptotic ap¬ 
proximations The methods of computation used and the accuracy expected 
are discussed Use is made of the representation of an arbitrarily distributed 
variate as a monotone function of a uniformly (rectangularly) distributed vari¬ 
ate. It is hoped that these tables will encourage experimentation with new 
statistical procedures. 

2. Introduction. Two sorts of statistical procedures have been widely ex¬ 
ploited in theoretical statistics—first the use of linear and quadratic combina¬ 
tions of the unordered observations and, second, the use of ranked (ordered) 
observations. Statistics based on ordered observations have recently been 
dubbed systematic statistics [2, Mosteller, 1946] Analytic processes and a few 
necessary numerical tables have advanced the study of the first procedure greatly, 
at least for the special case of the normal distribution, but analytic procedures 
have not done much to exhibit the behavior of systematic statistics and the neces¬ 
sary tables have been lacking 

It would be very helpful to have' (1) at least the fust two moments (including 
product moments) of the order statistics, and (2) tables of the percentage points 
of their distributions, for samples of sizes from 1 to some moderately large value 
such as 100 and for a large representative family of distributions. This is a 
large order and will require much computation 

The first step in this direction was taken by Fisher and Yates [1] by tabulating 
the means, to two decimal places, of all order statistics from normal samples of 
size < 50. The present paper continues the process by supplying all means, 
variances, and covariances for samples of size < 10 from (a) the normal dis¬ 
tribution, (6) the uniform (rectangular) distribution, (c) a special distribution 
with long tails For purposes of comparison, we also supply approximate 
means, variances, and covariances for the uniform and the special distribution 
computed from suitable asymptotic formulas. 

The special distribution has the representing function 

(1) r(u) = (1 — u)~ im — u m \ 

where u has the uniform distribution on the interval [0, 1], and x = r(u) is the 
variable whose order statistics interest us. This special distribution was es- 
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pecially constructed 1) to have high tails and 2) to provide moments of order 
statistics in closed form which could be evaluated with a reasonable amount of 

labor. The normal distribution is rather unreasonable in this latter respect_ 

there being no known expression except m terms of single and double quadra¬ 
tures of some considerable numerical difficulty. 

We have restricted ourselves to samples of size < 10, and to only three dis¬ 
tributions, all of these symmetrical, because of limited man-power rather than 
limited interest Additional tables of a similar nature will surely prove helpful. 

In order to obtain even as much information as provided in this paper, it has 
been necessary to make a joint effort, dividing the labor. The various parts of 
the work have been carried out more or less separately by the various authors— 
the means and variances for the normal by Mosteller, the covariances for the 
normal (which, with their double quadratures, required far more time than all 
the other thought and computation combined) by Hastings with some assistance 
from Mosteller, the choice of the special distribution by Tukey, and the com¬ 
putation for it by Wmsor. 


-~7 


3. Results. In this section we provide the various tables that have been 
computed. 

Table I gives the mean and standard deviation of the fth order statistic 
xfa | n), [or Ki,„ j we use whichever notation seems less likely to confuse and 
agree that x(l \ n ) > x{2 | n) > • • > x(n | ?i)] from a sample of size n drawn 
from a uniform ({/), normal (N), and a special distribution ( S ). All three 
distributions have been adjusted to have zero mean and unit variance. In 
addition Table I gives approximations for the mean and standard deviation as 
computed from asymptotic formulas for the normal (AN) and the special (AjS). 

If J(x) is the density function, the asymptotic approximation for the mean 
m(i | n ) of the fth order statistic from a sample of size n is obtained by solving 
the equatiqn 


f f(x) (lx = i/(n + 1) 
‘'wUln) 


9 


for m(i | n). Similarly the formula u3ed for the asymptotic variance of x(i | n) 
is 

i(n — i -f 1) 
n(n + l)*{/[m(z|w)]] 2 ‘ 

Values are given for n = 1, 2 , • • • , 10 and % — 1, • > • 

an entry in the table for means, a missing entry m(n — i + 11 n) = —m(i | ri) J 
if w(i | n) is an entry in the table of standard deviations, a missing entry 

w(n — i + 11 n) = w(i \ n). 

Table II gives the variances and covariances of the order statistics for the 
normal distribution ( N ) and the same quantities as approximated by the asymp- 


Ifwi is 
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TABLE I 


Means and standard deviations of order statistics x(i\n) for uniform distribution 
(U), normal ( N ), special ( S), asymptotic normal (AN), 
asymptotic special (AS) 


Mean 

Standard Deviation 

n 

D 

U N S 

AN AS 

V N S 

AN AS 

i 

1 

0 0 0 

0 0 

1.00000 1.00000 1.00000 

1 2533 .9804 

2 

1 

57735 56419 .53493 

' 4307 .3418 

.81650 .82665 .84490 

9168 .7486 

3 

1 

.86603 84628 80240 

' .6745 5466 

.67082 74798 .82783 

7867 6823 

2 

0 0 0 

0 0 

.77460 66983 .58457 

7236 .5660 

4 



.56569 .70122 .82982 

.7144 .6542 

2 

.34641 .29701 .25540 

2533 1992 

.69282 60038 .52682 

.6340 5035 

5 

i 

1.15470 1.16296 1 12449 
.9674 .8136 

48795 .66898 .83642 

6670 .6415 

2 

57735 49502 42567 

. 4307 .3418 

61721 55814 50390 

5798 .4730 

B 


.65465 .53557 44903 

.5605 .4384 

6 

l 

1 23718 1.26721 1.23847 
1.0676 .9114 

42857 64492 .84423 

.6331 .6330 

2 

74231 .64176 .55458 * 

.5659 .4539 

.55328 .52874 49425 

.5426 .4567 

3 

.24744 20155 16785 

.1800 .1412 

.60609 .49620 41648 

.5147 .4057 
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TABLE I ( Continued ) 


Mean 

Standard Deviation 

n 

i 

U N S 

AN AS 

U N S 

AN AS 


i 

1 29904 1.35218 1.33506 
1.1504 .9957 

.38188 .62603 .85217 

6072 .6141 

2 

86603 75737 .65892 

.6745 5462 

.50000 .50670 48992 

5150 .4359 

3 

.43301 .35271 .29375 

.3186 ,2512 

55902 46875 .39963 

.4826 3772 

4 

0 0 0 

0 0 

57735 45874 .37747 

4737 3617 

8 

9 

■ 

1.34715 1.42360 1 41892 
1.2207 1.0697 

.34427 61066 .85988 

.5867 .6276 

2 

.96225 .85222 74690 

.7647 .6259 

.45542 .48930 .48823 

.4936 4402 

3 


.51640 .44807 .38998 

4584 .3743 

4 

19245 .15251 .12502 

1397 .1094 

.54433 .43264 .35616 

4447 .3494 

■ 

1.38564 1.48501 1 49358 
1.2816 1.1358 

.31334 .59780 86725 

.5691. .6268 

2 

1 03923 93230 .82317 

.8416 .6954 

.41779 47508 48800 

4763 4361 

3 

69282 .57197 .47995 

.5244 .4191 

.47863 .43171 , .38414 

.4393 .3722 

4 


.51168 .41303 .34321 

4227 .3356 

5 

0 0 0 

0 0 

.52223 .40751 33173 

.4178 .3268 
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TABLE I ( Concluded ) 


Mean 

Standard Deviation 

n 

i 

U N S 

AN AS 

U N S 

AN AS 

10 

i 

1.41713 1.53875 1.56057 
1.3352 1.1956 

28748 .58681 .87423 

5557 ,6275 


2 

1.10222 1 00135 89062 

9085 7574 

.38569 .46318 .48859 

4619 .4334 



78730 .65608 .55336 

.6046 .4866 

.44536 4182G .38054 

.4238 .3604 


4 

47238 .37572 30866 

.3488 .2754 

48105 .39756 33477 

4052 .3261 


5 

15746 12274 .09961 

.1142 .0894 

.49793 38857 .31190 

.3973 3117 


totic formulas (AN). The asymptotic? covariance between x(i \ n) and x(j | n) 
is given by 

_ j(n — i + 1) _ 

n(n + l) 2 f[m(i | n')]f[m(j | n)] ’ 3 - % ‘ 

Symmetry relations exist for supplying the missing entries, 

cov [x(i | n), x(j | n)] = cov [cr (n — i + 1 | ri), x(n — j + 1 | n)]. 

It might seem more natural to use the factor n + 2 rather than n in the denomi¬ 
nator of the asymptotic variances and covariances so that the formulas would 
more nearly agree with those for the uniform distribution However the use of 
n gives much better approximations for the normal and the special distribution. 

Table III gives the variances and covariances of the order statistics for the 
uniform distribution iU), and Table IV gives the corresponding results for the 
special distribution (S) Table Y gives the asymptotic variances and co- 
variances for the special distribution (AS). 

Table VI compares the correlation coefficients between the order statistics 
x(i | n) and x(j | n) for the uniform (17), the normal (N), and the special dis¬ 
tribution ((S'). 

It seems worthwhile to call attention to the following: 

(1) . Even for n = 10, the asymptotic formulas do not give satisfactory mean 
values for the order statistics. 

(2) . For n > 8, the asymptotic standard deviations for the normal are close 
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enough to be very useful. For the special distribution we must except the two 
order statistics on each end from this statement. 

TABLE II 


Variances and covariances of the order statistics x(i\n) for the 
normal (N) and the asymptotic normal (AN) 




l 

2 

3 

4 

5 

6 

7 

8 

9 

10 


t 

N 

_ 

AN 

JV 

AN 

N 

AN 

N 

AN 

N 

AN 

N 

AN 

N 

AN 

N 

AN 

N 

AN 

N 

AN 

2 

1 

68 

84 

.32 

.42 

















3 

1 

.66 

62 

.28 

33 

17 

21 
















2 



.45 

52 

















4 

1 

49 

51 

.24 

28 

16 

18 

.11 

.13 














2 



36 

40 

.24 

.27 















5 

1 

46 

44 

.22 

.24 

.16 

17 

.11 

12 

07 

.09 












2 



31 

.34 

21 

23 

.15 

.17 














3 





m 

.31 















6 

1 

42 

40 

,21 

.22 

.13 

.15 

.11 

.12 

07 

.09 


07 










2 



28 


,19 

,20 

14 

16 

.10 

.12 












3 





26 

.26 

.18 

m 

• 












7 

1 

39 

,37 


.20 

.13 

14 

.10 

ii 


m 

.06 

07 

.05 

.05 








2 



.26 

.27 

17 


13 

.14 

10 

,ii 

08 

.09 










3 





,22 

.23 

17 

18 

.13 

.14 












4 







.21 

.22 













8 

D 


.34 


■ 

13 

.13 

m 

m 

1 

m 

■ 

E9 

m 


I 

1 






2 



.24 

.24 

.17 

.17 

.12 

.13 


10 

08 


.07 









3 






,21 

m 

.16 

12 

,13 


.ii 










4 







.19 


.15 

16 











9 

1 

.36 

.32 

18 

18 

,12 

.13 

m 

.10 

.07 


E 

.07 

05 

I 

04 

m 

1 

04 




2 



.23 

23 

16 

16 

u 

12 

09 


B 

.08 

06 

m 

05 

IBB 






3 





HQ 

m 

.14 

15 

.11 

,12 

HE 

Uni 

.08 

.08 








4 







.17 

18 

.14 

14 

12 

.12 










5 









.17 

,17 











10 

1 

34 

.31 

17 

.17 

.12 

.12 

B9 

m 

19 


■ 

m 

HI 




m 

m 

m 

.03 


2 



.21 

.21 

,14 

.15 

,ii 

.12 





.06 









3 





,17 

.18 

13 

.14 

11 

.11 

.; fj3 

IjjS 

08 


m 

.07 






4 



* 




.16 

.16 

.12 

.13 

.11 

ii 

.09 

jKi 








5 









15 

.10 

.13 

13 










(3). For n > 8, the asymptotic variances and covariances of the normal are 
close enough for many, if not most purposes. 
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(4). For the special distribution, only the variances and covariances of mod¬ 
erately central order statistics are adequately given by the asymptotic formulas. 

TABLE III 


Variances and covariances for the umfoim distribution (Z7) 


w 

\ t 

*\ 

1 

2 

3 

4 

s 

6 

7 

8 

9 

10 

2 

1 

66667 

33333 









3 

1 

m 











2 


PH 









4 

1 


n 

16000 









2 



32000 








6 

n 

23810 

19047 

14286 

09522 

04762 







K9 


.38095 

28671 

19047 








n 



.42857 








0 


18367 

15306 

12245 


06122 

03061 








.30612 

24430 

.18367 

12245 










36735 

27551 







7 

i 

14583 

12500 

10417 

08333 








2 


.25000 

20833 

,16667 


08333 






3 



.31250 

E 








4 




.33333 







8 

1 

11852 


08889 

07407 


■ 

m 





2 


.20741 

. 17778 

14815 

.11852 

■Eli 

EH 





3 



,26667 

.22222 

17778 

,13333 






4 




.29630 







9 

1 

■egg 

iKm 


06545 

05455 

pn 


■ 

I 



2 


17455 

15273 




1 ! 

04363 




3 




B 

RH 

13091 

■ 





4 




26182 

2181t 

17455 






6 





2727S 






10 

1 

■ 

ei 

mm 

IS 


IB 

m 

11 

0165c 

.00826 


2 

■ 

14876 

.13221 

mm 



06611 

0495' 

IMIMIll 

1 


3 

■ 


.19835 

.1735 

14876 

1239' 

■ 

.07435 




4 

■ 



.2314C 

1983 

.1652' 

.13221 





5 

E 




2479. 

2066 






(5). The correlation coefficients change rather little from distribution to dis¬ 
tribution, the poorest approximation being for end order statistics. 
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TABLE IV 

Variances and covariances for the special distribution (5) 
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It is believed that the means are correct to within one unit in the fifth decimal 
and that the standard deviations are correct to within 2 or 3 units in the fifth 
decimal. 


TABLE V 


Variances and covariances of the special distribution as computed 
from asymptotic formulas 


n 

X 

1 

2 

3 

4 

5 

6 

7 

s 

9 

10 

2 

1 

2 


ffm 








3 

1 

46550 











2 











4 

1 

42792 

20168 

13444 

10698 








2 


.25347 

16898 








5 

1 

41156 

19167 

.12579 

09605 

08231 







2 


.22368 

.14679 









3 



19221 








6 

1 



12105 


07464 

06679 






2 


20861 

.13529 


.08341 







3 



16457 

.12343 







7 

1 

37715 

.17627 



06782 

.05842 

.05388 

mm 




2 



,12258 


07354 







3 



.14232 


08538 







4 




13731 

_ 




1 



S 

1 

.39389 

1S276 

.11746 

08669 

06935 

.05873 


.04924 




2 


,19382 

12458 


.07355 

06229 

05538 





3 



14011 


08272 

.07006 






4 




.12211 

09769 






9 

1 

.39286 

.18226 

11881 


,06829 

05727 

BS 

04556 

04367 



2 



.12308 

■ 

.07126 

■ 

m 

.04754 


4 


3 




1 


n 

m 





4 




11265 

| 08958 

07512 






5 











10 

1 

39373 

18242 

. 11677 

08560 

06775 

05646 

04891 

.04379 

04051 

03937 


2 


18784 

12024 

.08813 

06977 

05814 

05036 

04508 

04174 



3 



12988 

09620 

.07536 

.06280 

05440 

.04871 



' 

4 




10633 

08417 

07014 

06076 





5 





09716 

08098 






The evaluation of the covariances was much more troublesome, requiring the 
evaluation of iterated integrals of the form 


f xf(x)F*(x) f — F{t)]' dt dx. 

J— CO V— OC 
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Necessary linear combinations of such forms give rise to considerable loss of 
accuracy. The covariances are believed to be correct to within 1 unit in the 
second decimal (except for one or two values which may be off by two units). 

TABLE VI 

Correlation coefficients X 10- between order statistics x(i \ ri), | n) for the 
uniform (U), normal (N), and special distribution (S) 



Better tables of these covariances are badly needed, and it is hoped that someone 
will provide them. 

The asymptotic values are correct to the .two decimals given. 
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6. Computation in terms of the representing function. It will prove con¬ 
venient in working with the special distribution, as indeed it does in many 
statistical procedures, to introduce the representing function r(u), which is a 
monotone function such that 

Pr (r(w!) < x < r(uf) j = Ui — Ux , ih > ui . 

Thus if u has a uniform (= rectangular on [0, 1]) distribution then x = r(u) 
defines a variate with the given distribution 
The ith order statistic of n from the uniform distribution, u,\ n , is distributed 
according to 

iC)u n ~\ 1 - u)'- 1 du, 0 < u < 1, 


where it is important to remember that Ui, n is the largest and not the smallest 
order statistic; and the joint distribution of u = U{, n and v = , (j > i), 

is given by 

i(j — i)l. - n I v n ~’(u - w) 3 ~ t-1 (l - w)' _1 dudv, 0 < v < u < 1, 

\j>3 ~ h n ~ J J 

where . n . is a multinomial coefficient. 

L i,j -%,n- 

The means, variances, and covariances which we desire can be written as 
follows (it is immaterial whether we think of expectations over *’s or over u’ s): 

E(x, [n ) = ®(r(«,,.)) = * (”) ( r(tt)tt-*( 1 - u)'~ l du, 

var (&,,„) = -B(x„„) 2 - (®(a;„»)) z = E(r\u un )) - E(x un ) 2 


= l 


r\u)u n ~\ 1 - u) du - (E(x An ))\ 


COV (^i|n ) ^jin) —. j in) J|n) 

= , n )) in)®(^/m) 

= - i) r n .1 f 1 r r(u)r(v)v n ^(u - iO'~*~ l (l - u)^ 1 du dv 

Ij 1 ) j 'l') U J J JQ Jv 

— E(x t \ n )E(x,\„) 


Introducing E, lt by 


E,'t = [ f r(u)r(v)u’ v‘ du dv, 
J 0 Jv 


we have 

E(Xi\ n Xj\ n ) = l(j 

I 



k | 



Ek+m, 


n—i~ 1—k j 



424 


HASTINGS, MOSTELLER, TUKBY, AND WINSOR 


and, in particular, 

E(xi ,2X2,2) = 2 Eo,d t 

E(xi isTjis) = 60J5?2,i — 120-2(1,8 + 602/o,8 . 
Introducing E by 

2?.,. = f r 2 (u)u‘ du, 

Jo 

we have 

2(^|n) = t (") Z (-l) i En-(+k.n- 

and, in particular, 


■>+& 


Introducing E, by 


— 202?a,s — 202/4,4 


E, = [ r{u)u' du, 
Jo 


we have 




and, in particular, 


2/(tsis) — 30 Ei — 602a + 302 4 . 


Thus the computation of the desired means, variances, and covariances is 
reduced to the computation of the integrals E ,, E ,,., and 2,, ( . 

We shall also want to calculate the asymptotic approximations to the means, 
variances, and covariances of the order statistics. For the uniform distribution, 
it is well known that 


mean («,|„) = 
var (m,|„) = 


n — i + 1 
71+1 ’ 

i(n —f+l) 

(n+!)■(» + 2)’ 


, N i(n — j + 1) 
COV = - 


0 < j). 


(■n + 1 )Kn + 2)’ 

These asymptotic formulas are transformed from u to x by the relations x = 
r(u) and dx = r’(u) du, giving 

/ \ ( n — 7 + l\ 

approx mean (*,| n ) = r I ^ ^ I, 


, . /,/»-* + i\Y 

approx var (x,|„) = I r' I - ^ + ^ J J 


i{n — 7 + 1) 

(» + l) ! (n + 2) ’ 
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approx cov (x t \ n x,\ n ) = r' ( -- ^ 1 ) 

\ n + 1 / 

, / n — i + l \ i(n —j + 1 
\ n + 1 / 


<» + 1) 2 (« + 2) ’ 


0 < i), 


as noted above, in our calculations we have replaced n 4- 2 by n m the denomi¬ 
nator. 

6. Reduction of integrals for the special case. When the reDresenting func¬ 
tion is 

1 1 


x = r(u ) = 


(1 - b) x m x * 


(X > 0), 


we obtain a symmetrical distribution with long tails. (For the normal dis¬ 
tribution r(u) = o(ln u) as u —> 0). The integrals we want are 


E t 


E, 


E s = [ {(1 - u)~ K - u^ju'du, 
Jo 

„ = f {(1 — m)~ x — «~ x | 2 b* dii, 

Jo 


t — [ f {(1 — u) x — u x } ((1 — v) x — V x ) wV du do, 
Jo Jv 


which can be expressed as 

E, = ^L(X) - B a (X), 

E a , a = A.,.(X) - 2 £.,.(X) + <7, f .(X), 

E s ,t = A,, t(X) — — CM + d),,«(X), 

where 

^1 8 (X) = f (1 — BpV du - &(— X, s), 

Jo 

1 


B«(X) = f u^u’du = — — - - , 

Jq S -j- 1 — A 

= f (1 — uT*u° du = &( —2X, s), 

Jo 

f 1 

J3 a S (X) = / (1 — w)” X 71 _X w* d-M = 6( — X, S — X), 
Jo 

C S , 8 (X) = j[ u-^u’du = - + Y ZT2A ’ 

A,.,(X) = f [ (1 - wr x (l - v)~ x u’v‘ du dv 

Jq Jy 
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— V 1 t—Y bji ~h 1 ~ 2X, t) 

~ h \i) ( ' i + 1 - x 

n l 

(1 — w)" x tT* u'v‘ du dv 

. 


_ V* ( s \ (-Y d" 1 ~ t — X) _ b(s + t + 1 — X, —X) 
,=o \v i -\- 1 — X £ + 1 — X 

C.,t(\) = f [ kT x (1 — »)~ X «V du dv 


= - , t — r (&(— t) — b(s + t + 1, —X, —X)], 
s + 1 ~ X 

n l f 

m _x iT* «* a' du dv 

_1_ 

(£ + 1 — X)(s + £ + 2 — 2X)' 


■where throughout 


b(p, q ) 


v [ q [ 

(p + 9+ 1)! 


Tip + i)r(g + l) 
r(p + 9 + 2) 


B(p + 1, 9 + 1). 


7. Calculations for the special distribution. The computations for the special 
distribution were made from the formulas in the preceding section. The quan¬ 
tities A, B,C,D were computed from r = s = 0 to r+s = 8, whence the values 
•of E,, E, j, E, t were calculated The values of the means, variances, and co- 
variances were then obtained from the formulas of section 3. 

The means, variances, and covariances are believed to be accurate to the five 
decimal places given. 

8. Formulas and accuracy for the uniform. The means, variances, and co- 
variances of the uniform are given near the end of section 5. Since r(u) = u, 
they are also the values given by the asymptotic approximation, when n + 2 
is used. 

The tabulated Values were computed to six decimal places and rounded to the 
four or five decimals given, 
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SEQUENTIAL CONFIDENCE INTERVALS FOR THE MEAN OF A NORMAL 
DISTRIBUTION WITH KNOWN VARIANCE 

By Charles Stein and Abraham Wald 
Columbia University 

1. Summary. We consider sequential procedures for obtaining confidence 
intervals of prescribed length and confidence coefficient for the mean of a normal 
distribution with known variance. A procedure achieving these aims is called 
optimum if it minimizes the least upper bound (with respect to the mean) of the 
expected number of observations. The result proved is that the usual non¬ 
sequential procedure is optimum. 

2. Introduction. The problem of sequential confidence sets in general has 
been considered briefly by one of the authors [I]. Let [X,}, (i = 1, 2, • ■ -)i 
be a sequence of random variables whose distribution is specified except for the 
value of a parameter 6 whose range is a space ft Sequential confidence sets are 
determined by a rule as to when to stop sampling, together with a function of 
the sample whose value is one of a specified class of subsets of ft. The class of 
subsets is chosen in advance depending on the purpose of the estimation. For 
example, it may be the class of all intervals of prescribed length or the class of 
all sets whose diameter does not exceed a given value. It is required that the 
piobability that this (random) set covers 8 should be greater than or equal to a 
specified confidence coefficient a for all 9. A procedure for finding sequential 
confidence intervals is considered optimum if it minimizes some specified function 
of the expected numbers of observations. Here this function is taken to be the 
least upper bound. In contrast with the result of this paper, a case where se¬ 
quential confidence intervals may have an advantage over non-sequential pro- 
ccduies has been given by one of the authors [2] The X, are independently 
normally distributed with unknown mean and unknown variance, and the prob¬ 
lem is to find confidence intervals of fixed length for the unknown mean. As 
was first shown by Dantzig [3] this cannot be accomplished by a non-sequential 
procedure. Another case where this is true is the problem of finding confidence 
intervals of the form (p 0 , kpa) where k is a specified number greater than 1, for 
the probability in a binomial distribution. 

Let (X,}, (i = 1, 2, • • •), be independently normally distributed with un¬ 
known mean £ and known variance u '{. It is desired to specify a sequential 
procedure for obtaining confidence intervals of fixed length l for the mean £. 
This is provided by a rule according to which at each stage of the experiment, 
after obtaining the first m observations Xi, ■ • , X m for each integral value m, 
one makes one of the following decisions: 

a) Take an (m + l)st observation. 

b) Terminate the procedure and state that the mean lies in the interval 

427 
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(Y ~~ hh Y + hi), where Y — G£ m (-Xi, — , X m ), bemg a measurable real¬ 
valued function. The serial number m of the observation on which the proce¬ 
dure terminates is, of course, a random variable and will be denoted by n 
For any relation R the symbol P(R 1£) will denote the probability that R 
holds when £ is the true mean of X<. The confidence coefficient of a sequential 
procedure S is defined by 


(1) a(S) = g.l.b. P(Y - il < Z < Y + u If). 

Denote by «o(jS) the maximum expected number of observations, i.e. 

(2) n 0 (S) = l.u.b. E(n | £, S ) 

I 

where E(n | £, S) denotes the expected value of n when £ is the true mean and the 
procedure S is used. 

A procedure S will be considered optimum if, for all S' such that a(S') = 

a OS), 

(3) n 0 (S) < n a (S'). 

It will be shown that an optimum procedure S{v, c ) can be obtained as follows: 

a) For all m < p, a fixed positive integer, take another observation. 

b) For to = v, terminate the procedure if 


(4) 




and let Y — -2X,. (The inequality (4) is used merely as a device for fixing 

the probability of taking v observations, this random event to be independent 
of whether (F — hh Y + hi) covers £, given p.) 

c) Otherwise take a (v + l)st observation, terminating the process, and let 


F = 


1 

r + 1 


v+l 

l 


Wlien c = 0, this is the usual non-sequential procedure. 

Clearly, 

(5) c)] = P{xU > c)Ii + [1 - P[xU > c}]H , 

where 

«> - wjy* •>* - v 1 if 

Also 

(7) n 0 [S(r,c)] = r + 1 - P{$-i > <£}, 

By a proper choice of v and c we can achieve any desired confidence coefficient 
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There is no essential loss of generality in considering only the 


case o-i=l, and this will be done m the remainder of this paper. 


3. A lower bound for n D (S) and an upper bound for a(S). Consider any 
sequential procedure S for obtaining confidence intervals of length l Put 

(8) a({, S) = P{Y - < £ < Y + iZ | €}. 

That is, a(£, S) is the probability that the confidence interval will cover the true 
mean £ when the procedure S is used According to (1), 

(9) a(S) = g Lb. afe S). 

t 

In order to obtain a lower bound for n 0 (S) and an upper bound for ct(S), we 
suppose that the procedure S is applied when £ is not a fixed number but a ran¬ 
dom variable normally distributed with mean 0 and variance <r 2 . Then the 
probability that the confidence interval covers £ is 

(10) a(<r, S) = e - {!/2l2 ate, S) d£ A a (S) 

and the expected number of observations is 

(11) E{n | tr ,S) = jf + °° e-** 11 '* E(n | £, S) d£ g n 0 (S). 

Let p m (£, S), (m = 1, 2, • • • , ad. inf), denote the probability that n = m 
when £ is the true mean and procedure S is used. Put 

( 12 ) ft»(<r, S) = ^/=- a j r S) dl 

Since 

eo 

(13) E(n | <r, S) = m Pm (<r, S) 

m=l 

we obtain from (11) 

00 

(14) 2 mfmicr, S) A n 0 (S). 

tn=l 

We shall now derive an upper bound for a(a, S ). Since X, = £ + e, where the 
t, are independently normally distributed with mean 0 and variance 1, the joint 
distribution of £ and X,, {% = 1, - • , to), is a multivariate normal distribution 
with 


(15) 


£?£ = EX, = 0 



430 


CIIABLES STEIN AND ABRAHAM WALD 


and covariance matrix 


(16) B 


£ 

Xi 


,X m ) = 


a* <r 

c 2 


+ 1 * 

: t * +1 


1 2 
a a 


cr 2 + 1 


Thus the conditional distribution of £ given Xi, • ■ ■ , X m is normal with mean 
tffclXi,".,*.)- (<rV",<r S ) 
f (m — l)o- 2 + 1 

( 17 ) =,’( 1 , 1 , .., 1 ) 


( 2 I i 2 

(7+1 CT 

2 2 _l i 

o- cr + 1 


2 2 
V £7 


2 

cr 

2 

-1 

X/ 

(7 


cr’ + l. 


x„ 


mcr 2 + 1 mcr 2 + 1 

a (rn — 1)(7 2 + 1 

mcr 2 + 1 wnr 2 + 1 


mer 2 + 1 


me 2 + 1 
(m — l)er 2 + 1 


mcr 2 + 1 mcr 2 + 1 


Xj 

x m 


mo- 2 + 1 

2 m 

Ex. 


mo- 2 + 1 


and variance 
(18) 


(m<r 2 + l) 2 


4 / m \ 2 2 


irur 1 + 1 ’ 


If Xi, • • • , X m is a sequence for which the process is terminated on the mth 
trial, the conditional probability that the interval of length l will cover £ is 
clearly maximized by talung 

2 

(19) Y = S(£|X 1 > ---,X m ) EX. 

rncr 2 + 1 l 

and, by (18) this probability has the value TI (c n ) where H is defined by (6) and 
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Hence, 

(21) a(a, S) < £ Pmfa, S)H(c m ) 

W=1 

From this and (10) we obtain 

(22) a(S) < £ p*(«r, S)FI(c, n ). 

1 

This upper limit of a (/SO and the lower limit of n 0 (S) given in (14) will be used 
later to prove that S(v, c) is an optimum procedure. 

00 

4. Maximum value of £ p«i(<r, S)H(c m ) subject to the condition that 

1 

00 

Y, mpm((Ty S) does not exceed a given bound. We shall show that the maximum 

i 

00 

of Y pm(o*, S)H(c m ) subject to 

i 

oo 

S(n | <r, /S) = 52 S) ^ v + a, 

i 

where visa positive integer and 0 < a < 1, is obtained by choosing p m (o, S) = 
p?, defined by 

Pm = 0 for m < v or m > v + 1 

(23) p* = 1 - a 

* 

Vv+i — a. 

For, suppose to the contrary that there exists a sequence {p m } such that the 
following conditions hold: 

oo 

Pm > 0, Z Pm ~ 1 

00 OO 

(24) Z m Pm < v + a = Z m P* 

i i 

Z VmH(c m ) > Z P*H(c m ). 

1 » 1 

We have 

(25) H(u) = A A I" e^'dx = f V^e^dy. 

y ir J o V 2ir 

Put 

(26) C = H(o +1 ) - (c,) = -J= f Ul y~‘ e~ iu dy. . 

v2t j 4 



432 


CHARLES STEIN AND ABRAHAM WALD 


With the aid of p„ = 1 — 22 •M* Pm , we obtain from the last two inequalities 
in (24) 

00 00 

(27) 0 < 22 (Pm - Pm)H (r, m ) — C 22 - Pm)»l = 22 (p>» ~ Pm)Jf m 

1 1 m^v 


where 


(28) K m = ff(c„) - tf(c,) - (m - r)[J/(c f+1 ) - #(c„)] 

Clearly 7£,+i = 0. Also, for rn < v, since the integrand is a strictly decreasing 
function of y, 


K n 


(29) 


= (r — to) / y * e iv dy — ; 

Jc{ 


y iv dy 


< (v — m)\ y * e iv — (v — m) \ y~* e iv 


ll/-c{ 




= 0. 


Similarly for to > r + 1, 77 m < 0. But pZ = 0 for m v, v + 1 so that 

(30) 22 (ju - Pm)7C < 0 


which contradicts (27) since K ,+1 = 0. 

Thus, we have shown that the inequality 

(31) | o-, 5) < v 4- a 
implies the inequality 

(32) 22p m (<r, S)H(c m ) < (1 - a)77( C ,) + oflW). 

1 

5. Proof that S(v, c ) is an optimum procedure. Since, according to (14) 
and (22) 

oo 

(33) Wo(5) > E(n \ cr, S) and a(S) < 22 Pm(<r> S)H{c m ), 

it follows from the result expressed in (31) and (32) that, for any procedure S 
satisfying the inequality 

(34) mo(/S) < v + a, 
we must have 


(35) a{S) < (1 — a)H(c,) + a77(c,+ 1 ) 
identically in <r. Since H(u) is continuous, it follows that 

(36) «(S) < (1 ~ a)H(Vv 0 + oli[yV+~l 0 
for any procedure £ satisfying (34). 
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The right hand side of (36) is £*[$(», e)] where c is chosen so that 

(37) 1 - a = > c). 

We use an indirect proof to show that S(v, c) is an optimum procedure. Sup¬ 
pose to the contrary that there is a procedure S' such that 

(38) a(S') = a[£(v, c)] 

but 


(39) n a (S') < n a [S(v, c)]. 

By (5) and (7), a[fS(i>, c)] is a continuous strictly increasing function of 

r + 1 — P{x5_i > c} 

and this latter is Wo[iS(r, c)]. v If we choose v', c' so that 


(40) 

it follows that 


no(S') < / + 1 - PixU > c'} 
< v + 1 — P(xJ-i > c), 


(41) «c0] < «[Sb,o)] = a(S'). 

But (41) andjthe first part of (40) contradict the result expressed in (34) and 
(36). 
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NOTES 

This section is devoted to brief research and expository articles on methodology 
and other short items. 

A USEFUL CONVERGENCE THEOREM FOR 
PROBABILITY DISTRIBUTIONS 

By Henry Schefp£ 

University of California at Los Angeles 

In problems of establishing limiting distributions it is often apparent that the 
probability density p n {x) of a random variable X n has a limit p[x) • throughout 
this paper n = 1, 2, 3, ■ ■ • , and all limits are taken as n —> , If p(%) is the 

density of a random variable X, what we really care about then is ivhether the 
limits apply to probabilities, which involve integrals of the densities: Does 
lim Prf-Xftin S) = Pr\X in $] for all 1 Borel sets S, or, does 

(1) lim / p n (x) dx = p(x) dx ? 

The question is thus one of taking a limit under an integral sign. Perhaps the 
most widely used justification of such a process is the following theorem of 
Lebesgue [1, p. 47; 2, p. 29]: If for a sequence {/«(*)} of integrable functions, 
lim f n {x ) = }{x) for almost all x m S, then a sufficient condition that 

lim [ f n (x)dx = f f( x) dx 
Js Js 

is that there exist an integrable function g(x) which uniformly dominates the 

sequence (/„(x)},thatis,|/„(a:) | < g(x ) for all n and all x in S, and / g{x ) dx<x>. 

Ja 

For example, in the excellent new treatise by Cramer the limitii g form of the 
(-distribution is treated as follows [1, p, 252, other examples <n pp. 369, 
371]: For n degrees of freedom the (-variable has the density 

(2) p n {x) = c„(l + x*/ri)~ iln+1) , 
where 

(3) c n = (7nr) _i r(i(n + l))/r(£n). 

It is shown fairly easily that lim p„[x) = p(x), the density of IV(0, 1), where 

1 In defining the convergence of a sequence of distributions to the distubution of a dis¬ 
continuous random variable X it is desirable to modify this requirement so that it is de¬ 
manded only of sets iS which aie continuity intervals of X [1, p 83]. We are concerned here 
however only with the “absolutely continuous case" where X has a probability density p(x). 
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N(m, cr 2 ) denotes the normal distribution with mean m and variance <r 2 Then 
to prove 



Cram4r shows that {p n (m)} is uniformly dominated by an integrable function. 
It is instructive to consider some examples where 

(4) lim [ p n (x) dx 

go 

does not equal 

(5) J lim p n (x ) dx. 

In the examples (i), (ii), (m), lim pjx) = 0 for all x and hence (5) is zero for 
all £. 

(i) p„(x) = 1 for —n — 1 < x <—n, zero elsewhere. Then (4) equals 1 for all £. 

(ii) p n (x) = l/n for — < x < \n, zero elsewhere. Here (4) equals § for all 

(m) p n (x) = 2 nx for 0 < x < l/n, zero elsewhere. Now (4) is zero for 
£ < 0, unity for £ > 0. 

An example m which lim p n (x) ^ 0 is 

(iv) p n (x) = %[h n (x) + po(*)], where h n is the p n of one of the above examples 
and p 0 is a fixed density. Then lim p n (x) — \p a (x). Now (4) exceeds (5) by 
half the amount it did in the corresponding above example. 

The essential features of these examples could be obtained with normal 
distributions but would involve a little more computation, for instance, N (—n, 1), 
N( 0, n 2 ), N(l/n, l/n 4 ), for examples (i), (ii), (iii), respectively 

We note that in none of these examples is lim p n (x) a density. This suggests 
that the trouble might perhaps be prevented by requiring that lim p n (x) be a 
density—which happens in the case from which we started. This surmise is 
correct. We may formalize the situation as follows’ 

Definition. A function f(x ) will be called a density if it is non-negatwe and 

/ f(x) dx = 1. Here R denotes the whole space of x. 

The reader may think of a univariate density, where a; is a real variable and 
R is the real axis, but theorem and proof run the same for a fc-variate density, 
where a: is a point in a fc-dimensional Euclidean space R. 

Theorem 2 . If for a sequence { p n (x )} of densities 

lim p n (x) = p(x) 

2 The hypotheses of this theoiem, while perfectly adapted to applications m probability 
and statistics, would not seem the “natural 1 ' ones in real vanable or measure theory Pro¬ 
fessor A, P, Morse has remarked to the writer that, if the theorem has not been stated in this 
form before, it is at least an easy corollaiy of some more general results Imown in that field. 
Nevertheless our direct pi oof based only on the familiar Lebesgue theorem and using only 
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for almost all x in R, then a sufficient condition that 

lim / p n (x) dx = / p(x ) dx, 

uniformly for all Borel sets S in R, is that p(x) he a density. 

Proof. Let us write the difference 

(6) p n (x) - p{%) = S„(x) 

Then 

( 7 ) S^x) -> 0 
for almost all x in R. Also 

(8) Sndx = p„ dx - p dx, 

Jg Jg Js 

and so it suffices to prove that / <5„ dx — > 0 uniformly for all S in R, where S 

Js 

henceforth denotes a Borel set. If in (8) we let S — R we get 


(9) 


[ S n dx = 0 
Jr 


Since p„ and p are densities. We now split the difference 5„(a;) into its positive 
and negative parts: Let 

( 10 ) + 1 8 n I ), K = 4 (^« _ | &71 | ), 

so that 

5n = St + 5~, St >0, 5~ < 0. 

Fi'oni (7) and (10), we find 

( 11 ) in —>" 0 
for almost all x in R, and from (9), 

(12) f Stdx+ f i~ dx =0. 

Jr Jr 


very simple manipulations may be of interest to readers of the Annals. Professor Morse 

jJso pointed out that the stronger result lim 1 | p n (x) — p(x) | dx = 0 uniformly for all S, 

Js 


may be stated. This follows from our proof since 
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By virtue of (6), 5„ > -p. Now if 8 n < 0, 5„ = 8 n > -p, and if 5„ > 0, 5„ = 
0 > —p, and hence in every case 0 > 57 > -p. Since we now have | o~(x) ] < 

p(x) and f p(x) dx = 1 , we may apply 3 the Lebesgue theorem to get 


lim 


/ 5„ dx = f 
Jr ■ Jr 


= / lim 5„ dx. 


The right member is zero because of (11). It then follows from (12) that 

lim / st dx *is also zero. The relations 
Jr 


0 < / 5 tdx < / Sidx-^O, 

Js Jr 

0 > f 57 dx > f 57 dx — > 0 
Js Jr 

guarantee that the quantities / St dx and / 8~ dx have the li mi t, zero uniformly 

J 8 Js 

for all S, and hence the same is true of their sum (8) 

Returning to the example (2), we remark that it is practically obvious that the 
second factor on the right has the limit e~ ix \ but it is not qiute so obvious that 
lim On = (27r) -i . This situation is typical of many applications where it is 
more difficult to evaluate the limit of “the” constant than the limit of the re¬ 
maining factors, and one wonders after obtaining the latter limit whether the 
constant is not automatically forced toward the limit desired for it, and whether 
the direct calculation of its limit could not be avoided. Let us put the question 
as follows: Suppose that 

(p„(x) = C n f n (x )) 

is a sequence of densities and that 

p(x) = cf(x) 

is also a density. Then if lim /„(s) = f(x) for almost all x, may we conclude 
that lim c n — c? If so, we could then apply the above theorem without having 
evaluated the limit of the constant or produced a dominating function. Un¬ 
fortunately the answer to this question is no, as shown by example (iv) above: 


8 Although our proof lests on the Lebesgue convergence theorem, this theorem is applied 
nto S(a) and not to p n {x). While in most cases of practical interest the sequence |p n (*)) 
is uniformly dominated by an integrable function, it is possible to devise a simple example 
where this is not tiue and yet our theorem applies Let p n (s) = 1 for 1 /(« + 1) < x < 1 

n 

and for a n < x < a„+i, zero elswhere, wheie a T = S 1/t. Then sup p„(x) = 1 for 

1=1 

all x > 0, nevertheless lim p n (x) is a density, namely that of the uniform distribution on 

(0, 1) 
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If we let f„(x) = h n (x) + p 0 (x), and f(x) - p 0 (x), then lim /„(%) ~ /( x), but 
= i and c = 1, hence lim c„ ^ c Employing the assumption that p n (x) 
and p(x) are densities we see 

1/Cn = [ fnU) dx, l/c = f /( e ) dx, 

Jr Jr 

and hence lira c n = c if and only if 

(13) lim f f n (x) dx = f lim f n (x) dx. 

Jr Jr * 

It follows that in such cases if we wish to establish a limiting distribution in the 
sense (1), we may either prove lim c T . — c, or we may justify (13), say by produ¬ 
cing a suitable dominating function, but we need not do both. No doubt the 
first alternative would be preferable at all but the most advanced levels of 
teaching or exposition 
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AN EXPLICIT REPRESENTATION OP A STATIONARY 
GAUSSIAN PROCESS 

By M. Kac 1 and A. J. F. Siegert 

Cornell University and Syracuse University 

1. In a paper which will soon appear in the Journal of Applied Physics [1] 
the authors have introduced methods of calculating certain probability dis¬ 
tributions which are of importance in the theory of random noise in radio re¬ 
ceivers. 

The complexity of the physical problem and occasional uses of heuristic reason¬ 
ings may have obscured some of the mathematical points. For this reason the 
authors felt that it may be worth while to illustrate one of the basic ideas on a 
simple but important example. 

2. A stationary Gaussian process is a one parameter family xit) of random 
variables such that: 

(a) . x(t) is normally distributed; the mean and the variance being inde¬ 
pendent of t 

(b) . the joint probability distribution of x(fi), .x(< 2 ), ■ ■ • , x(f T ) is multivariate 
Gaussian whose parameters depend only on the differences i,- — h . 


John Simon Guggenheim Memorial Fellow 
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We assume, for the sake of simplicity, that the process is normalized, i.e., 
E{x(l)} = 0, £{s 2 (£)}=1 

and we define the correlation function p(r) by the usual formula 

p(t) = E{x(t)x(t + t)}. 

It is then well known 2 that a distribution function a(u) exists such that for all t 

COS UT dcr(u). 

□0 

3. Let 0 < s, t < T and consider the symmetric kernel 

K(s, l ) = p(s - t). 

The fact that <r(w) is non-decreasing implies that the kernel p(s — t) is quasi- 
defimte, i.e., for every L 2 function g(l) on (0, T) one has 

[ [ g(s)p(s — t)g(t ) ds dt > 0. 

Jq J o 

Thus the eigenvalues of the integral equation 

(2) [ p(s — L)f(i) dt = \f (s) 

are non-negative. Moreover, denoting by X, the eigenvalues and by /,(£) the 
corresponding normalized eigenfunctions of (2) we have by the classical theorem 
of Mercer (see [4], in particular part 6 of Ch. I) that 

(3) p(s - t) = X) X,/, (s)f,(t), 

1 

where the series on the right is absolutely and uniformly convergent. It should 
be noted that m virtue of (1) p(r) is a continuous function. 

4. Let now G\ , G 2 , (r 3 , • • • be independent, normally distributed random 
variables each having mean 0 and variance 1. 

Consider the series 

(4) E V\GJ,(l). 

3 

Since for each t we have 

Z (vZ/yft) 2 = E x;/ 2 (t) = P ( 0) = 1, 

we infer that for each t the series (4) converges m the mean to a random variable 
x(t). Moreover, by a theorem of Kolmogoroff [5], the series (4) converges, for 
each £, to x(i) with probability 1. 

2 See [2] The theorem m question (m a somewhat different foim) seems to have been 
first established by N Wiener m [3] 
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Thus we may write 
(5) 

7 

It is now easy to show that x(t) thus defined is a stationary Gaussian process 
in (0, T) with the correlation function p(r). 

In fact, 

E{x(s)x(t) } = £ = pis - t),0 < s, i < T, 

1 

and conditions (a) and (b) of section 2 follow from the well known properties of 
linear combinations of independent Gaussian random variables Of course, 
we are dealing here with infinite linear combinations but the mean convergence 
noted above, is sufficient to justify the extension to our case 
5. It is moie illuminating to think of the random variables G, as measurable 
functions (?,( a) defined on an abstract set in which a Lebesgne measure has 
been established (the measure of the whole space being 1). 

The representation (5) can then be written in the equivalent form 

( 0 ) !.(/, «) = £ s/\G,(t»)f,(i). 

3 

The equality, as established in section 4, holds for every t in the sense of mean 
convergence Moreover, by the theorem of Kolmogoroff cited above, and by 
Fubini’s theorem the equality (6) holds for almost every pair (i, w), (0 < t < T), 
in the sense of ordinary convergence 
Furthermore by Mercer's theorem (remember that A,- > 0) 

r r 

£ pi s - s) ds = T 

) Jo 

and hence 

£ \E[G)} = £ A , [ Gj(co) = £ A, = T < co. 

1 i J S1 l 

Thus 

£ A, (w) 

1 

converges for almost every a> and therefore the series 

(7) £ Va,<?,<*)/,G) 

i 

converges in the mean for almost every «, 

Combining this fact with the observation that (7) converges almost every¬ 
where to x(t, w) we see that, for almost every w, the series (7) converges in 
the mean to x(t, w) and that consequently 
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(8) f X (t , a') di = 2 A, G*(a>) 

vO 5 

for almost every w. 

It should be noted that (8) could not, in general, be derived by just appealing 
to Paiseval’s relation. The main reason is that Parseval’s relation holds only 
for complete orthonormal systems whereas the orthonormal system {/„(()} of 
eigenfunctions may fail to be complete If the kernel p(s - t) is positive- 
definite (in which case all the eigenvalues are positive instead of just non-nega¬ 
tive) then it is known that the eigenfunctions form a complete set This actu¬ 
ally, happens to be the case m most physical applications 

6. An important application of (8) is the calculation of the characteristic 
function of the distribution function of the random variable 

(9) I = f T x\t, a ) di. 

Jo 

In fact, 

(10) £{exp W )) = II E\ exp (#X, G]} = II (1 — *S\)“*. 

J 1 

The probability density of I is the Fourier integral 

2^ I*, exp(-i£7) II (1 - i£X,r s d£ 

which, unfortunately, in most oases cannot be calculated explicitely. If 

P(r) = e~^l, 

in which case the process is also Markoffian, the eigenvalues X, can be cal¬ 
culated explicitly 3 4 but m more complicated cases it is quite difficult to deter¬ 
mine them. 

7. If p(r) is absolutely integrable and o-(p) absolutely continuous then, setting 

A(u) = <r'(u ), 

we have A (u) >0 and 

p ( t ) = f cos utA(u) du = f e lur B(u) du, B(u) = —— . 

J— OO J— 00 " 


3 See [6], in particular section 4 We take this opportunity to correct two misprints in 
this note In the last'formula on p. 64 M should be replaced by N. Also the limits of 
integration in formula (6) should be 0, s and s, p + q instead of 0, p + q and 0, p + g 

The N.D R C Report 14-305 to which a reference is made has been declassified in the 
meantime. It contains ics-ulls which originated both [1] and the present note 

4 These and lolaicd ic.sulrt weic staled m " lie abstract [7] by.M Kac. The paper is now 
being prepaied toi publication 
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It can then be shown 4 that 

lim i S X? = 2tt f B 2 (u) du = f p 2 (r) dr 

r—>oo 1 i J—co J— oo 

and 

hm i Z X5 = (2a-) 2 [ B\u) du. 

T—> oo 1 j J— 

It follows now by standard methods that the characteristic function of 



approaches, as T —> «, 



where 

(7' = ( p 2 (r) dr. 

J—00 

Thus, as T —> <», the distribution of (11) becomes normal with mean 0 and 
variance <r a . 
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APPROXIMATE FORMULAS FOR THE RADII OF CIRCLES 
- WHICH INCLUDE A SPECIFIED FRACTION OF A 

NORMAL BIVARIATE DISTRIBUTION 

By E. N. Oberg 
University of Iowa 

1. Introduction. Given the normal bivariate error distribution 

( 1 ) *(*,*)=( l/2ir* x * u )c-^ + ^. 

The purpose of this paper is to present certain approximate formulas for the 
radii of circles whose centers are at the origin, which include a prescribed pro¬ 
portion, p, of errors The formulas are, for given <r x , <r u , and p, 
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(2) Ri - V2a x <r„ In (l/[l — p]) 

(3) Ri = V(4 + 4) In (1/[1 - p ]) 

and 

(4) Rs — (<r x + <r»)-v/(1/2) In (1/[J - p]) 

In section 3 we present tables of p', the true proportion of errors contained m 
circles whose radii are given by the above formulas. These tables reflect the 
goodness of approximation of each formula to the true radius, R, for 0 1 A p S 
0 9 and 0.5 A c x /<r y A 0 9 Also, a brief statement is included for the same range 
of p but with 0.1 g <r x /<r v g • 4 

2. The derivation of the formulas. The proportion p of errors that fall 
within an area A on the sp-plane is given by 

(5) P = f (pfo y) dA. 

If the area is bounded by any member of the family of elipses 

®/ff* + V /<r v = X, 

the above integral may be evaluated directly. The result is 

p -1 - 

whence 

X 2 = 21n(l/[l - p]). 

Thus the ellipse with semi-axes 

(6) a x \/2 In (1/[1 - p}), OvV 2 In (1/[1 - p}), 

measured from the origin along the x and y axes respectively, will include ex¬ 
actly the prescribed proportion of errors. 

Frequently, however, it is desired to know which circles rather than which 
ellipses include a certain proportion of the errors. In this case it becomes 
difficult to obtain a formula for the true radius from (5) unless <r x = <r„ in which 
case R is given by either one of the formulas in (6) However, a natural ap¬ 
proximation to make is to equate the area of a circle of radius, say R, to the area 
of, the ellipse whose semi-axes are given in (6). This gives formula (2), 

Ri = V'2<r i ff„ In (1/[1 - p]), 

which can be expected to give a fairly close approximation to true R if <r x is, 
close to (t-j . If <r- <t„ , it has been shown that this formula underestimates 

tine R which is undesirable in some applications' [1] That is, if R\ is used to 
estimate, say the radius of a circle to include 50% of the errors (p = .5), it will 
give a value vhich includes less than the desired proportion. The first table in 
the Iasi section gives a numerical verification of this fact. 
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To obtain formula (3) we consider formula (5) when A is a circle of radius R. 
We have 

J r n rVs 1 -! 5 

1 y) dy dx 

0 ^0 


By making the transformation x = a x r cos 6, y = <r y r sin 6, and by carrying out 
the integration with respect to r the above formula becomes 


V = 1 - ( 2 /m) f S 

Jo 


-0-2)008291 


dd. 


We let 


and 

Then 


a = R 1 /(rrl + trj), ft = (<rl — <rl)/(<rl + <rl), 

Ox/ £ f &X Oy . 

2\ 


a = I£ 2 /ov(l + £2 )i an d ft ~ (1 — « 2 )/(l + « 2 ), which is less than unity. 
This substitution will be helpful later in preparing tables. The fact that v* 
is taken less than a v places no limitation on the final results since we only have 
to interchange axes in the other case. The above integral may now be written 


as 


(7) 


0ooe2 0 ) 


fir/2 

V = 1 - (2 /tt) jl <f" A 

pjr/2 

= 1 - ( 2/v)f a rfg. 

Jo 


The integrand, say F(0), in the last integral of (7) can be shown to be monotone 
increasing from <r" fl/1 to as 0 variesfrom 0 to ir/2. Furthermore, it crosses 
the line F{8) = 1 somewhere in this interval and differs but little from it any¬ 
where if the ratio (r*/V„ is close to 1, since ft is then close to zero If, therefore, 
we replace the integrand by F(8 ) = 1, we have p = 1 — e _ “. Hence, if a is 
replaced by R 2 /(<rl + 4) and the result solved for R, we have formula (3), 


Ri - V(vl + a 2 y) In (1/[1 - p]). 

Finally, formula (4), 

Rs = {ox + Oy)y/ (j) In (1/[1 - p]), 

is obtained by taking the root-mean-square of the former two. This formula 
has certain advantages over the other two, the most obvious being that a x and 
<r v enter linearly so that it is simple to evaluate for given u x , <r v , and p. Sec¬ 
ondly it will be seen by the tables and additional comments made in the last 
section that when p = 0 5, 1 R 3 overestimates true R by a slight amount for all 


1 This particular value of p gives the circular probable error In this case Ri = 
0.58870. + v„) 
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values of a x /a t ,, and it gives a fairly close approximation to true R for all p 
when o x / ^ 0 5 

We close this section by making a few brief comments In the first place, 
if any of the above formulas is to be computed from a sample of data, we take 
\Z'Lx 2 /{n — 1) and \/Sy 2 /(n — 1) as estimates of a x and respectively Fur¬ 
thermore, we test the significance of these statistics by known formulas [2] 

Finally, a x and a y may be replaced by ~ D x and An where D x is the 

population mean deviation. Thus, for example, 

Rs = (D x + D v ) |/?Jn(l/[l-p]). 

3, Tables. The first formula m (7) is useful in testing by means of numerical 
integration the goodness of approximation of the formulas Ri , Ri , and R 3 to 

TABLE I 


p computed by means of formula Ri 


/ 

/ 

/ * 

.1 

2 

25 

.3 

4 

.5 

6 

7 

75 

8 

.9 

.5 

.0988 

1951 

.2425 

2893 

3815 

4720 

.5615 

6508 

.6960 

7422 

.8408 

.6 

.0944 

1974 

2459 

2942 

.3899 

4846 

.5786 

6726 

7198 

7676 

.8668 

.7 

.0997 

1987 

.2480 

.2972 

3950 

.4924 

5894 

.6864 

.7350 

7838 

.8835 

8 

.0999 

.1995 

.2492 

2989 

.3981 

.4970 

5958 

.6946 

.7440 

.7936 

.8935 

.9 

1000 

.1999 

2498 

.2997 

3996 

.4993 

5991 

6988 

.7483 

.7986 

.8985 

1.0 

1000 

2000 

2500 

.3000 

4000 

.5000 

6000 

.7000 

.7500 

.8000 

9000 


the true value of R We construct the tables by replacing R in a by one of these 
formulas, say formula Ri . This gives a = [2 e /(I + e 2 )][l/(l — p)]. Since 
0 — (i _ e 2 )/(l + e 2 ), the right hand side of the formula in (7) may then be 
evaluated for a choice of € and p giving a value we denote by p'. This is the 
actual proportion of errors that is included in the circle whose radius is Ri ■ 
If Ei gave true R, then p’ would be equal to p, so we may regard the difference 
of p and p' as a measure of the error arising when Ri is used to estimate R. 

In the following tables the chosen values of p and 6 = <t x /<t v are listed in the 
first row and column respectively The remainder of the tables include the 

corresponding values of p'. , , 

We also have computed tables for 0.1 ^ a x /a v g 0.4 which we have not in¬ 
cluded in this paper since for this range of values of <r x /o y , all of the foimulas 
give approximations that depart considerably from true R except R 3 when p = 
0 5. For this case, p' = .4776, 5004, 5109, and 5120 when ovA = 0 1 , 0 2 , 
0.3, and ’0.4 respectively. 
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The difference between an entry in a column and the corresponding value 
of p at the head of the column reflects the erxor in estimating true R by means of 
Ri, Ri, and R 3 . For example, if p is chosen as 5 and c rja v = .7 then R s 
gives the radius of a circle which includes 50 13% of the errors. Thus fi 3 
overestimates true R by including .13% more of the errois. 

By examining the tables it is seen that when 0,1 g p ^ 0 3,12i gives the best 
approximation to the true value of E, while E 2 gives the poorest. If 0.4 g p S 

TABLE II 


p' computed by means of formula E 2 


/ 

■t / 

J 

/ 

/ 

1 

,2 

25 

3 

.4 

5 

6 

,7 

75 

,8 

.9 

.5 

.1215 

.2363 

.2912 

.3446 

4467 

.5432 

6346 

.7217 

7641 

.8060 

.8907 

.0 

.1116 

.2202 

.2732 

3255i 

.4274 

.5201 

.6218 

.7146 

.7600 

8050 

8949 

.7 

.1057 

2100 

2616' 

.3127 

.4140 

.5136 

.6116' 

.7081 

.7558 

8032 

.8976 

.8 

1022 

.2039 

2546 

.3051 

4050 

.50551 

0048 

7034 

.7525 

.8014 

8991 

.9 

.1005 

.2009 

.2509 

3012 

.4013 

.5012 1 

.6011 

.7008 

.7506 

.8003 

8999 

1.0 

.1000 

.2000 

2500 

.3000 

.4000 

.5000 

.6000 

7000 ; 

.7500 

.8000 

9000 


TABLE III 

p computed by means of formula E 3 


\ 

<V<P/ 

.1 

2 

.25 

,3 

,4 

.5 

.6 

.7 

.75 

.8 

.9 

.5 

fjflg 

.2161 

2674 

,3176 

4152 

.5092 


6887 

.7327 

.7768 


.6 



.2597 



.5059 




,7872 

.8817 

.7 



.2548 


.4046 

.5031 

BBS 


.7456 

.7937 


.8 

1011 



.3020 

4018 




.7483 

7976 


.9 




K 

M 



.6998 


.7995 

.8992 

1 0 









R||| 




‘0.75, E 3 gives the best and E 2 the poorest; and if 0.8 5 p 5 0.9 Rt gives the best 
and Ei the poorest. Thus formula Ea for general use gives the best overall- 
approximation. It may be remarked at this point that bounds for the true 
value of E can be found by applying two of the formulas, one of which over¬ 
estimates while the other underestimates E From the tables it is apparent that 
this can be done for values of p 5 0.8 
Finally, these formulas may be used to test roughly the normality of the data. 
For example, if proper estimates 2 of <r t and <r„ are made from the data, and the 


1 Soo eection 2 
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EFFICIENCY OF SEQUENTIAL TEST 

corresponding value of f ?3 computed for a chosen p, then approximately, the 
proportion p' of plotted errors should fall within the circle of radius R 3 . 
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A NOTE ON THE EFFICIENCY OF THE WALD SEQUENTIAL TEST 

By Edward Paulson 


Institute of Statistics, University of North Carolina 


The sequential likelihood ratio test of Wald for testing the hypothesis H 0 
that the probability density function is f(X, d B ) against the one-sided alternative 
Hi that the function is f(X, 8 L ) has been shown [1] to have the optimum property 
of minimizing the expected number of observations at the two points 6 = 6 a 
and 8 = 6 ].. Tables showing the actual magnitude of the percentage saving 
of this sequential procedure compared with the classical “best” non-sequential 
test have been calculated (see [1], page 147) for the normal case when 


f(X, 9) = 



-(X - e) 2 
2 


In this note we will show that when 0i is close to 0 O , the percentage saving is 
independent of the particular function f{X, 6 ) and the particular values 0i 
and 0 O , so that the tables mentioned above can be used to show the percentage 
saving for any one-sided sequential test involving a single parameter, provided 
f{X, 0) satisfies some weak restrictions. 

Let f{X, 0) be the probability density function of a random variable Let 
EJn) denote the expected value (when 0 = 0,) of the number of independent 
observations required by the Wald sequential procedure to test the hypothesis 
He that 0 = 0 O against 0 = 0 X = 0 O + A with probabilities a of rejecting H 0 
when 0 = 0 O and /3 of accepting Ho when 0 = 0i Let N be the number of in¬ 
dependent observations required to achieve the same probabilities a and (5 
for testing the hypothesis 6 = 0 O against 0 = 0i by the most powerful non¬ 
sequential test Let U a and Up be defined by the relations 



and 
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Wc will prove the following theorem: 

Limit M , 108 + 0 ~ a) log (r^)l 

i.T lBo to UJ~ (tf« + Un¬ 

provided f(X, 0) satisfies the following conditions: 

{A) / f(X, 6) dx can be differentiated twice under the integral sign with respect 

J— po 

to 9 

(. B ) All four of the integrals 

r* j»/ / .. /\*\ “ 12 ^ 

f(x, Bo) dx, 


rj/"(x, e*) 

r/'M*)7\ 

JL A fix, 0 *) 

L /(*, nil 




L 


° f'(x, dp) 
0 f{x, 0„) 


f(x, 6*) dx, 


are continuous functions of 9* at 6* — do. A sufficient condition for ( B) is that 
all the integrals be uniformly convergent with respect to 9* in some interval 6 0 < 
0* < 0o + A, and all the integrands be continuous functions of X and 8*. A 

f E (71 ) 1 

similar theorem holds regarding the limit of { > . 

A -.0 iv J 

The proof is as follows: From [1], we know that 


Fo (n) 


a log ( --1 + (1 - a) log () 

-^ ~ + *>■ 


where 


z = log 


f/Ml 
L/(*, So) J 


and o(l) —* 0 as A —> 0. 
Now 


«*> - 

*00 *00 
= [log f(x, 00 + A)]/(x, 0o) dx - 1 [log f(x, B„)]f(x, Bo) 

J— 00 V— 00 


dx. 
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Expanding log fix, 8 0 + A) in a Taylor series about A = 0, we have 


log fix, Oa + A) = log fix, do) + A ] + —I_ f? 

fix, fl 0 ) 2 L P Js 


+ f 


where 


< 0* < 0 O + A f — 3 /fa, ^ 

’ 39 ’ J 38* ’ 


7 / ,/ 2 —ie-.a* 


if" - r 


J0=»0 O 


From assumption (A) we find that 


f /'fa, 6o) dx = 0 and [ fix, 9 0 ) dx = 0, 

J-oo 00 


wlule from assumption (B) 


Therefore 


f Rifix, 8 0 ) dx —> 0 as A —> 0. 

«/—oc 

*«■ *+•«]• 


To find A for the most powerful non-sequential test, we make use of the fact 
(see [2]) that an asymptotically most powerful test for one-sided alternatives is 
given by a region of the type 

tt — - 1 V /'fa** ^ TT 

Uy ~ Vwh 7M) - K ‘ 

When A —> 0, N —* °° , and since C/ jV is the sum of N independent variates with 

JJ _ JjJ /JJ \ 

a finite second moment, the distribution of —--—— approaches that of a 

a Vir 

normal variate with zero mean and unit variance. Hence we find the N re¬ 
quired for a test with Type I and Type II errors a and fJ by solving for N from 
the relations 

K 

/ /,.\ ■> = P* 


/ / r v _ “ 

V E ° [ -71 
r \j /e=e„ 

K - ViY Hi 

\//fag 0 

yfagy -fafa) I 

r. \J/o=e a L / e=0o J 


= ~ Up 
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Now let y 




6=*Bq j 


W = 


, and we find from (1) and (2) that 

UaVW) + DWW - W 

ny) 


l 


Now 


(y) = f ft) dx 

J_« /fa, 0(l) 


_ / f f'( x t ft>) J-// a \ j I A f ft)) [«/ fl'il0=0* rM- 

- 4 1. KmJ j (i '« * + 4 i. 7(j^) u 8 ' 1 “. * 

= AjE q j/ 2 [1 4- 0(1)] from assumption B. 

Proceeding in a similar manner, we find 

HWW + VpVEty)* - [Sr(i/)]‘f = W)[l/« + 1^(1 + o(l))]\ 

We now have 


Sofa) _ A 2 [So( 2/ 2 )] 2 (1 + o(l)) 2 

iV + 17,(1 + o(l))P 

therefore 


a log 


X 


1 - P 


+ (1 - a) log 


-L-) 

I — a) 


limit 

a-q 


m 


= -2 


“ ioe C— f ') 


- | W) + o(l)l 


+ (1 - a) log 


1 — a 


(Ua + Up)* 
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A NOTE ON THE POISSON-CHARL1ER 1 * 
FUNCTIONS 

By C Tbuesdell 
Naval Ordnance Laboratory 
The polynomials p n (m, z) given by the definition 

( 1 ) Pn(m,z) ^ (-) m Sz' m £[e-’z m \, 

1 This note was written while the author was employed by the Radiation Laboratory, 
MIT; 



POISSON-CHARLIER FUNCTIONS 


451 


called the Poisson-Charlier polynomials, and the associated function 'p n (rn, z) 
given by the definition 

(2) z) = p n {m, s), 

(3) 'Pobn, z) = —- , 

in' 

occur in statistics. Doetsch [1] has devoted a memoir to them, and they are 
noticed'in Szego’s Orthogonal Polynomials (pp. 33-34). 

I suggest that they are most directly and easily studied in connection with 
the “F-equation” 

(4) j s P(z, a) = F(z, « + l), 

whose properties and application to various special functions I have sum¬ 
marized m a recent note [2] Using the theorems of that note, which I shall 
cite by number, I shall now generalize the Poisson-Charlier polynomials and 
sketch the speediest derivation of their most interesting formal properties. 

Greek letters shall represent unrestricted real numbers, while Latin letters 
shall represent integers. 

From the existence theorem for the F-equation (Theorem 4) we know that 
there exists an integral function of z, F^z, a), which satisfies the F-equation 
and the condition 

(5) Ffl(0, a) = cos(a + 0)tt ■ 

From the uniqueness theorem for the F-equation (Theorem 4) it follows that 

(6) F g (z, n — p + i) = 0, 

(7) F${z, n) = 0, n > 0. 

From the general power series solution for the F-equation (Theorem 4) we have 
the formula 

(8) Fp(z, a) = COS (a + P)tt ^_^^iFi(a; 0 + « + 1; *)■ 

We now define the Poisson-Charlier functions in general by the formulas 

(9) p fl («, Z) = r(a + 1)2 ~ a Ff,{z, -a), 

(W) M<*, s) = ^ ^ g)- 

From the formulas (6) and (7) we see that [1, p. 263] 

(11) (—n, z) = 0, n > 0; 
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(12) M~ n + (3 — z) = 0, pp(- n + $ — §, z) =0. 

From the formula (8) we see that 

(13) pp(a, z) = COS 0? - a)ir ^^ 2 _ “iFi( — a \/? — a + 1; z), 
whence it follows at once that 

(14) z) = cos /3 tt k\(-z)~\ 

This is the usual explicit expression for the Charlier polynomials [1, p. 257]. 
From formula (13) we see that 

(15) Vo(~ct, z) = r(l - a)z a y(ot , z). 

In the indeterminate case when ot is a negative integer we see from the formula 
(14) that 

(16) Po(wi, 2 ) = 1, m 2: 0. 

Hence 

(17) ^o(-a, a) = S -^~■ e~’y(a, z), 

(18) him, z) = ~~ . 

From the definition (10) we now see that 

(19) Mm, 2 ) = p 3 (m, z)Mm, 2 ), 

a generalization of the formula (2). From the formula (13) and the definition 
(10) we see that 

(20) hifi, 2 ) = cos (0 - a ) ,r r ( /3 + iTg Zf + 

Then by Kummer’s first transformation, 

(21) h(P,s) = COS (p - «)ir ^ + ^ + x) lFl<a + 1, a - IS +1 1 , 

from which it follows from the power series formula for solutions of the F-equa- 
tion (Theorem 4) that 1 p a {p, z) is a solution of the F-equation (4). 

We now have two different solutions of the F-equation based on the Poisson- 
Charlier functions: 

(A) 


F(z, a) = e*h(—a, 2). 
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(B) F(z, Ol) = Tpai.P, z). 

From the F-equation it is evident that 

(22) fjp,z) = *>&,*), 

whence we at once deduce the formula (1). Applying Taylor’s theorem for the 
F-equation (Theorem 8) to the solution (B) we see that [1, p 259] 

(23) t a (/3, z+h)=t h -j ia+Jp, z ); 

n =o n\ 

putting a. equal to zero we find that 

(24) _sin2/3ir e —* y( _ jS) s + h ) = £ fjfl, z), 

/iTT flaO 7b 

and, more specially [1, p. 260] 

(25) (l +*) m e- k = ± h - [Pn (rn,z). 

\ z) 7.=o n' 

Applying the same theorem to the solution (A) we obtain the formula 

°o yn 

(26) ffoict, z + h) = £ - M a ~ n , «)> 

71=0 71 


whence we recover the formula (11) by putting a equal to zero. 
Applying Theorem 9 to the solution (B) yields the result 


(27) 


il) t* lf>a+n((3, Z) = f 

71 = 0 *'0 


e~ s faiP, z + et) d9, 


which contains as a special case the formula 

(28) £f Pn (m,z) = (l+d^ 1 (!)’^ tW0 [™!-'y(» + l,«(l+^))_- 

Appell’s generating, expansion (see Theorem 10, part C or [3, p. 120]) applied 
to the solution (A) yields the result 

(29) 2 h( n > z + 2/) f " = 2 tpin, y)t n \ 

n = 0 ”-° 


hence 

(30) 


it pp(n, 2 + y) 

tj=0 71 ! 


e c*«/(»+») v ( yt \ n P^ n >v\ 
^0 \2 + y) n 1 


Putting y equal to zero and using the formula (13) we see that 
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(31) e) = e‘ (l ~ cos (hr. 

«_o n 1 \ 2/ 

Comparing this result with the formula (25) we see that 

(32) (-)>„(m, 2 ) = (-)"‘p m (n, z). 

It would be possible to proceed in this same fashion and discover many other 
formal properties of the Poisson-Charher functions, but it is perhaps easier to 
notice from the formula (13) that 

(33) pp(a, z) = cos (/3 — a)Tr(a: -p 1 )z~ a La~ a \z'). 

Lj(.r) being Laguerre’s function suitably generalized for complex lower index 
[4, p. 53], By means of this formula every relationship involving Laguerre 
functions may be translated into one involving Poisson-Charlier functions. 
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1. Random Variables with Comparable Peakedness. Z W. Birnbaum, 
University of Washington, 

9 

Let V and V be random variables with symmetrical distributions, i.e, with P(U £ — T) = 
P(U S T) and P(V £ — T) = P(V & T ) fm all T ^ 0 The random variable U shall be 
called more peaked than V if P(| U | S T) £ P{\ V 1 ^ T) for all T t 0. Let Xi ,7i and 
Xi ,Yi be two pairs of independent random variables such that A, is more peaked than 7, 
for i = l,2. Then under certain additional conditions X = X\ + Xi is more peaked than 
7 = 71 + 7 =. 

2. On Optimum Tests of Composite Hypotheses with One Constraint. Ebich 
L. Lehmann, University of California, Berkeley. 

The problem studied is that of finding all sumlar and bisimilar test legions of composite 
hypotheses, and of obtaining the most powerful of these regions Vauous results are ob¬ 
tained for distributions which admit sufficient statistics with respect to their paiameters 
Applications are made to the hypothesis specifying the value of the cireulai couelation 
coefficient in a normal population, and ceitam hypotheses concerning scale and location 
parameters m exponential and rectangular populations 

3. Estimation of a Distribution Function by Confidence Limits. Frank J. 
Massey, Jr , University of California, Berkeley 

Let si , ia , ■ , x„ be the results of n independent observations, having the same cumula¬ 

tive distribution function F(x) Poim the function <S n (x) = k/n where h is the number of 
observations less than or equal to x A confidence band &„(»)' ± X/Vn will be used to 
estimate Fix) To determine the confidence coefficient it is necessary to find Pr(max Vn 
| S n (%) — F(x) | £ X/Vnl It is sufficient to consider % uniformly distributed in the interval 
(0, 1) Let X-ybi = s/£ wheie s and t are integers Then S n (x), to stay in the band F{x) ± 
X/Vn, can only pass through ceitain lattice points above x = i/ln, t = 1,2, • • • , in. The 
probability of S n {x) passing through a paiticul&r sequence of these points is given by the 
multinomial law, and this can be summed over all permissible sequences. Limiting dis¬ 
tributions have been given by A Kolmogoroff, and by N. Smirnoff. It is desired to test 
the hypothesis F{x) = Fq(x) against alternatives F{x) = F i(x) Using the criterion reject 
F 0 (x) if 

max \/n | Fo(x) — <S n (x)| > X 

X 

the probability of first kind of error can be controlled by choice of X A lower bound to the 
probability of second kind of error against alternatives such that max y/n \ Fo(s) — Fl(x) | 

A is given This lower bound approaches one as n • —► » Thus the test is consistent 

4. A Note on Sequential Confidence Sets. Charles Stein, Columbia Uni¬ 
versity. 

Tins paper gmernlDes a paper of Stein and Wald, appearing m the Annals of Math Stat , 
Sept , 19-17 

Let (AM, (i = 1, 2, ), be a sequence of random variables whose distribution depends 

on an unknown parameter 9 Sequential confidence sets are determined by a rule indicating 

465 
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when to stop sampling and a mle giving the confidence set as a function of the sample. It 
is desired that, for each sample point, the confidence set should be one of a specified class S, 
that the probability of covering the true parameter should be £ a, and that the least upper 
bound of the expected number of obseivations should be minimized. If X, are inde¬ 
pendent wiLh the rectangular distribution on (0, 6) and S consists of all intervals of the 
form (6a , 16 0 ) with k fixed and da a function of the sample, the optimum sequential pro¬ 
cedure is the classical non-soquential procedure If the X, are independently and identi¬ 
cally distributed in accordance with a multivariate nprmal distribution with known co¬ 
variance matiix 2 but unknown mean 0, and the confidence sets are to be of the foirn (6 — e s )‘ 
2 _l (g — 9a) = r, r fixed, 0o a variable p-dtmensional vector, a similar result holds, provi ded 
the desired confidence coefficient a is not excessively small. 

5. Explicit Solution of the Problem of Fitting a Straight Line when Both 
Variables are Subject to Error for the Case of Unequal Weights. Elizabeth 

L. Scott, University of California, Berkeley. 

Let a, /9 and (i = 1, 2, ■ ■ , s), be unknown fixed numbcis and let i), = a + For 
each value of i there exist m, measurements it,,- of £, and n, measurements y,i of y, , (j = 
1, 2, • • • , nu ; k = 1, 2, • • ■ , n,). The variables x,, and y,/, are normally distributed about 
£, and i), with variances and o-j/a, respectively, where the weights u, and v, are known 
but <r\ and aic unknown The numbers m, and «, are bounded (usually small) while s 
increases indefinitely. Thus a, (3, and v* appear as structural parameters and the & as 
incidental parameters (See papei by J. Neyman and E. L Scott to appear in Economelrica.) 
Modified maximum likelihood equations (MMLE) yielding consistent estimates of the 
stiuctural parameters are tedious to solve when the products m,u, and n,v, depend on i 
The mam result of this paper consists m proving that the varying m,u, and/or n,v, can be 
treated as constants Let w, and w, be the harmonic means of Truth and n,n,, respectively. 
Now, MMLE’s written with m,ii, «« w, and n,v> = w 2 yield consistent estimates of a and fi. 
The asymptotic variances are also found. An application is made to certain problems of 
astronomy. * 

6. Unbiased Estimates with Minimum Variance. Charles Stein, Columbia 
University. 

Let X be a random variable distributed in the space B according to one of the p d.f’s 
if(x | 6), whore 0 is an unknown parameter, and let p(9) be a real-valued function of 9 
Let B(B) be the set of all x such that v(x \ 6) > 0 but <p [x \ 6a) = 0, and S the set of all 6 
such that B(d) has probability 0 when S is the true parameter value Let 

t(x | 6) = <p(x | flol/vf* I 6a and A( 0 i ,9% ) = E(^(X \ 0 ,) iK X | 62 ) I 0 ol 

for 0, , in S. Suppose . 4 ( 0 , , 9a) is everywhere finite and there exists a set function 2 

of bounded variation over S such that / A(0i, 0») d\(6i) = g(9i). Then an estimate of 

■>3 

g(9), unbiased for all 0 in N and having minimum variance at 6 0 is given by f(x) = 
j <p(x | 0) d\(6)/<p(x | 0o). The minimum variance is 

h 

definition of f(x) 13 modified at a set having probability 0 when 0 =■ 0#, the properties on S 
and at. 0o remain unchanged Under mild restrictions this alteration can be carried out so 
as to make /(*) an unbiased estimate of 6 for all 8 The results are lelated to the work of 
Fisher, Dugud, Rao, and Bhattacharyya on the amount of information 

7. Sufficient Statistics and a System of Partial Differential Equations. (A 

Contribution to the Neyman-Pearson Theory of Testing Hypotheses) Pre- 


L 


g(9) dX(0) - [ff(0 o )] ! If the 
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liminary Report. Erich L Lehmann, University of California, Berkeley, and 
Henry ScheffIs, University of California, Los Angeles. 

In the Neyman-Peaison theory of testing hypotheses the problem of the existence and 
determination, of similar regions has been tieated under two approaches (1) Assuming the 
existence of a set of sufficient statistics for the nuisance parameters, (2) Assuming that the 
probability density satisfies a certain system of partial differential equations By solving 
the differential equations it is now shown that they imply the existence of sufficient statistics 
for the nuisance parameters Knowledge of the form of the solution of the differential 
equations permits simplification of the known theory of optimum tests (type B, Bi , etc.) 
as well as some generalization 

8. Power Function of the Analysis of Variance and Covariance Test of a 
Normal Bivariate Population. W. M. Chen, University of California, Berkeley. 

The problem of finding the power function of the analysis of variance and covariance test 
' of a normal bivariate population, p = 0 and m = <r 2 , by means of principle of likelihood 
was reduced to the determination of the distribution function P(L) of the following moment 
problem. 

Vi “ 

5 

2 ) 

where 


a T / 71 . — 1 \ 

- r (-_ +r ) j14 . 


(k = 1 , 2 , 


f L k dP(L) = 
Jo 


(1 - < 


'(*- 


4»r<« - i + ,) r( " ~ l + 2t ) r (^* -A± ?l +,) 

r ( !L r) r ( !L i _I + r ) r(n - 1 + 2* + ’1 

and o, the argument of the power function, lies in the interval (0,1) and vanishes only when 
the hypothesis tested is true. The moment problem was found and solved by rather tricky 
methods The result is 


PiL) = 


(1 - 6 ) <«— D /2 





s 4- 1 


) 


where b - 



9. A Mathematical Model of the Relation between White and Yolk Weights 
of Birds’ Eggs. G. A. Baker, University of California, Davis. 

The purpose of such a model is to find a rational method of estimating a “best line” in 
some sense which will represent the relation between white and yolk weights for some or all 
species of birds From data at hand it appears that birds within species may differ in 
means and variances of weights and that the yolk and white weights are positively corre¬ 
lated Yolk and white weights within a species are functions of egg number. The standard 
deviations of yolk and white weights for different species are approximately proportional 
to mean values. The “true” means for yolk and white weights for different species do 
not lie on a line because of biological differences between species with the same egg size 
The standard deviation of specie* deviations from a sliaight lino depend on the size of the 
egg (may be proportional to a weighted’sum of the yolk and white weights) If sampling 
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variances are sufficiently small tliey may be neglected and a straight line fitted assuming 
both variables subject to eiroi and non-uniform variance The practicality of maximum 
likelihood estimates is considered. 

10. Statistical Analysis for a New Procedure in Sensitivity Experiments. A. 

M. Mood, Iowa State College, and W J. Dixon, University of Oregon. 

In the language of biological assay the sensitivity experiment investigates the proportion 
of subjects that lespond to a given concentration, x, of a certain chemical. It is assumed 
that only one test may be made on each subject. The neiv procedure is characterized by a 
change m x for encli successive tost, depending on the result of the preceding test x is 
reduced to the next lower of a fixed set of concentrations for the next test if there is no 
response and is increased to the next lughci concentiation if there is a response Observa¬ 
tions are thus eoncontiated near the mean and few tests are made for values of x where a 
very large or very small piopoition of subjects would lespond Assuming x is normally 
distributed, approximate maximum likelihood estimates are obtained for the moan and 
standaul deviation of x These assume a form which is simple to compute. Choice of op¬ 
timum increments of x for various situations is investigated. 

11. The Relation of Inbreeding to Calf Mortality. W. M. Regan, S. W. 
Mead, and P. W. Gregory, University of California, Davis. 

An analysis of calf mortality ill the University of California dairy cattle bleeding experi¬ 
ment is presented. Calves up to 4 months of age that were born singly are included in the 
study Only those stillbirths and abortions from cows free from Brucellosis and health 
and repioductive abnoimalilies wore considered, A total of 774 Jersey and 258 Holstein 
calves were included. Calves were classified according to inbreeding coefficients as follows: 
Class I, tho controls 0.0 to 0.1240; Class II, 0.125 to 0.2448; Class III, 0.245 to 0,3749; and 
Class IV, 0 376 and over. Tlicio was no relation between the number of abortions and the 
degree of inbreeding. The stillbirths, too few to be statistically significant, tend to increase 
as the coefficient of inbreeding increased. Following birth, however, mortality was corre¬ 
lated with inbreeding of both males and females but for the males it was gieater than for 
the females in Classes III and IV, but the difference is hardly significant. The Jerseys 
tended to be less viable than the Ilolsteins. Some of the increased mortality of the more 
highly inbred animals oould be accounted for by the action of two lethal genes; one con¬ 
trolling an anomaly of the liver, the other an anomaly of the heart, there was no plausible 
explanation for most of it Within sex, inbreeding class, and breed there was consideiable 
variation m the moitality of the progeny of different sues. Some of these diffeiences were 
statistically significant. 

12. Observations on Designs for Cooperative Field Tests. P. A Minges, 
University of California, Davis. 

In California conditions vary so greatly between the principal production areas that it 
is necessary to establish experimental plots in each of the areas if reliable information is 
io be obtained regarding cultuial practices. Most of these tests must be conducted on 
ranchos in ooopciuuon with glowers and local agricultural extension agents. The designs 
ol these tests should be lelamoly smijile, the arrangement should be adjustable to work 
into the growers' cultural piactices and to permit the obtaining of yield recoids with a 
minimum of interference to the gicmeis’ operations, yet the design must be adequate to 
yield valid data. The i undoruizcd block design has piovcd the most useful, although pa red 
plots, factorials, split-plots and Latin squaies have been used successfully undei oertain 
conditions The Latin squaie design is useful when a two-way variation is expected, other- 
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wise it is not usually very efficient. Where yield data are of prime importance, for ureplica- 
tions have been considered most practical In tests such as variety trials when factors 
other than yields are important, two replications may be adequate The size of the plot 
has been varied to fit the crop, conditions of the field, and known soil variability Plots 
two rows wide and 50 to 135 feet long often have been used, frequently without guards 
between plots Since it is desirable to include checks (untreated controls) in moBt tests, 
small plots will reduce the loss to the growers when the treatments prove beneficial. The 
information derived from these tests is of most interest to growers and county agents so 
the data should be presented in tables that are easily read The variability figure which is 
confusing to most people probably can best be presented as the least significant difference 


13. Population Genetics. N. H Horowitz, California Institute of Tech¬ 
nology. 

Population genetics attempts to describe the effects on the genetical structure of Mende- 
lian populations of factors such as mutation, selection, migration, and random fluctuations 
due to sampling errois These diverse elements are brought under a common viewpoint by 
considering their effects on geae frequencies Since change in gene frequency is the ele¬ 
mentary process of evolution, the above factors are causal agents of evolution. Mathe¬ 
matical models illustrating the interplay of the various elements have been constructed by 
Wright, Haldane, and Fisher. The nature of Mendelian inheritance is such that gene 
frequencies remain constant in large populations not subject to net mutation, selection, or 
migration pressures Unbalanced pressures initiate evolutionary changes which continue 
until equilibrium is reached at a new level of gene frequencies Equilibrium frequencies are 
determined by opposing pressuies—e g , opposing mutation rates, mutation opposed by 
selection, etc Equilibrium, stable or unstable, is also possible under selection alone In 
small populations, sampling errors among the gametes produce random fluctuations in gene 
frequencies which, superimposed on the equilibrium values, result in probable distributions 
of frequencies. The latter provide a mechanism for the evolution of characters, especially 
biochemical syntheses, which depend on the simultaneous action of a number of individually 
non-adaptive genes 


14. The Choice of Inspection Stringency in Acceptance Sampling by Attributes. 

J. L Hodges, Jr., University of California, Berkeley. 

In acceptance sampling by attributes, the probability p that an item will be defective is 
taken to be a function g{x, y ) of the quality x of the population and the stringency y of 
inspection. Let re, the number of items inspected, be fixed, and reject if the number of 
defectives is A k. It may then be possible to satisfy a condition on the power function 
with different values of k, by adjusting y properly. This paper is concerned with the choice 
of k and y in such situations. A criterion is given, and it is shown that the criterion is 
approximately satisfied by k = [ng(x a , y)] where x 0 separates acceptable and non-acceptable 
values of x, and y maximizes 


dg{x o, y ) 
* dx 


g(x o, y)[ 1 - g(xo, y)\- 


Ail asymptotic pioperlv of this approximation is shown The method is applied to two 
examples (a) testing the mean baereual densitv x of a liquid by the dilution method, y 
being the volume of liquid incubated, and (b) testing the variance x of a normally dis¬ 


tributed dimension of known mean m by applying gauges set at m ± 


1 

V 


The approximate 


solution is found to be satisfactory m both cases foi m = 20. 
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16. The Application of Learning Curves to Industrial Planning. Preliminary 

Report. James R. Crawford, Lockheed Aircraft Corporation. 

Learning euives are significant factors of analysis in industries producing quantities of 
less than 20,000 units of a given arLicle. Ship-building and anframc manufacture are the 
two largest industries in this class Learning curves occur where job costs are kept either 
by individual unit oi by lot, and also where achievement is measured against a standard 
Cost per unit plots against oidinal unit number as a straight lme on logarithmic graph- 
paper. Learning curves are used to supplement time-studies, determine the capacity of 
tooling, layout of budgets, and for estimating and bidding The experience of individual 
workers and management are reflected in these analyses The Blope of the learning curve 
is i elated to the amount to be learned PlaleauB occur which are related to the hiring of 
new workers and to the relaxing of control measures Other consistent minor patterns 
occur which are related to specific conditions. Equations have been derived and tables 
computed lor five related forms of the learning curve. Graphic methods are satisfactory 
except for bidding. This study ooveis a simple approach to an important problem of indus¬ 
trial management The findings in the industrial field may benefit research in the field 
of the psychology of learning 

16. Relative Effects of Inbreeding and Selection in Poultry. W. O. Wilson, 
University of California, Davis. 

Egg pioduction rate, fertility, hatchabilily, and chick mortality records from the Iowa 
State College Poultry Department's inbreeding project were studied Statistics which 
were calculated from the data included simple and partial regression of traits on inbreeding, 
estimates of heritability by correlation between paternal half-sibs and by daughter-dam 
regressions, and selection differentials The net genetic gain or loss in merit per generation 
was considered to be the sum of the product of selection diffeientials and hentability, plus 
the product of regression of trait on inbreeding and increase in amount of inbreeding. The 
amount of inbreeding that can be done m each of the traits was estimated when there waB no 
net loss or gain. OF the traits studied, the rank waB rn the following order Hatehabihty, 
chick mortality, fertility, and egg production. 

17. The Rate of Genetic Gab b Egg Production in Progeny-Tested Flocks 
as a Function of the Interval between Generations. Everett R. Dempster and 
I. Michael Lerner, University of California, Berkeley. 

The rate of genetic gam in a character for which selection is practiced depends in addition 
to the intensity of selection on (1) the accuracy of selection, and (2) the average interval 
between generations These factors are not independent and exercise a pull in opposite 
directions Through the application of Wright’s technique of path coefficients comparisons 
can be made between the expected rates of genetic, gam in populations containing varying 
proportions of breeding animals of different ages. The methods used involve the estimation 
of correlations betweon genotypes, and various selection indexes based on individual, sib 
and progeny records m incullod populations as well as in populations whose lange has beep 
restricted by previous selection. From these estimates the relative efficiencies of different 
age distribution schemes of a breeding population can be determined A specific solution 
for such a situation in a flock bred for egg production will be presented as an illustration of 
the problems and methods used in the study of tho genetics of populations under artificial 
selection. 

18. Statistical Criteria of the Effectiveness of Selective Procedures. Prelim¬ 
inary Report. R. E. Jarrett, University of California, Berkeley. 
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The validity coefficient,” the standard erroi of estimate, the index of piedictivc effi¬ 
ciency , the selection, ratio” of Tayloi and Russell, Johnson’s Gamma, and other statistical 
devices have been suggested as indices of the effectiveness of selective programs. These 
devices all suffer from the deficiency that they do not permit a satisfactorily precise estimate 
of the dollar value of the increased output expected from the selection piogram and thus 
leave unsettled the question as to whether or not the cost of such a program is justified 
The relationship between the correlation coefficient on the one hand and the mean value of 
y for an unselccted population (Y being an objective output-type criterion), the standard 
deviation, of Y for an unselected population, and the mean value of Y for the uppei Np in¬ 
dividuals selected on the basis of their high performance on the selective test X on the other 
hand, provides the basis for estimating the increase in the mean output of a group of workers 
selected on the basis of a testing program yielding any specified validity coefficient with the 
criterion Y Increase in productivity of selected woikers is shown to be a function of the 
validity coefficient, the ngorousness of selection, and the coefficient of variability of the 
output criterion among “unselected” employees 

19. Approaches to Univocal Factor Scores. Preliminary Report. J. P. 
Guilford, University of Southern California. 

In spite of the fact that umvocal factor scores are badly needed for vanouB reasons, it 
appears to be impossible by present methods to construct pure tests for some common 
factors. Recourse must tlierefoie be made to statistical control of component variances 
It is desirable to derive each factor score from a minimum number of tests. The availability 
of a few umvocal tests makes this requirement fawly easy to satisfy Such tests seive well 
as suppression vanables for their common-factor variances where not wanted in other tests. 
Several principles may be invoked as objectives: (1) to maximize the desired variance in 
the impure test, (2) to reduce the imdesired variance to zero, or (3) to minimize the undesired 
variance without intolerable loss of the desired variance A secondary objective is to 
assure a combining weight of +1 00 for the test measuring the desired factor. Equations 
for achieving the objectives have been derived and the limitations and implications of each 
procedure have been noted By means of statistical control, the situation seems hopeful 
for the achievement of umvocal scoies for a fairly large number of unique psychological 
variables. There are implications for experimental psychology as well as for vocational 
testing. 

20. A Note on the Problem, of Binary Stars. Elizabeth L Scott, Uni¬ 
versity of California, Berkeley, 

This paper concerns some of the problems of Trumpler (see next abstract) {», is the 
radial velocity of the z-th star, z = l, 2,* , s, at l 7 selected at random, ] = 1 , 2 , , n, x l)f 

measurement of £,,, is N (f M , <r t ). £,,• is random with distribution c (k, 2 — (f,, — fio) 2 ) — ^ 

where fc< ^ 0 and are unknown (1) Test of hypothesis that k, = 0 Case (z) a, known. 

Whatevei the exact test T, its power /3r(fc) has derivative 0 t(O) = 0. Test maximizing 

n 

;9r(0) is that of Trumpler with criterion S 2 = ^ (x,, — a:,) 2 > i?o\ Case (n) a x un- 

1=1 

known Whatever the exact test t, d£ m> (0) = 0, m — 1, 2, 3 Test maximizing /4 4) (0) is 

n p n “1 a n 

Trumpler’s test (x t , — Xi)' 1 > ( x *j — *.■)* (2) Let (|, 3 — t, 0 ) 2 = 2X, a 2 . 

1=1 Li=i J 1=1 

For constant velocity stars X = 0. For others it is a random variable. Since, given X = 0, 

S 2 is distributed as non-central x a , an integral equation connects the distributions of S 1 
and X Its solution yields an estimate of the proportion of constant velocity stars. After 
estimating the distribution of X, the level of significance can be estimated and also the 
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number n of measurements so that the proportion of constant velocity stars declared vari. 
able will be leas than p, specified in advance 

21. Statistical Problems of Spectroscopic Binaries. Robert J. Trumpler, 
University of California, Berkeley. 

Spectroscopic Binaries are stars whose radial velocities, as measured by the Doppler 
shift of spectral lines, show a periodic variation The first problem is to obtain a statistical 
criterion for deciding whether a star with several radial velocity measures, made at dilfeient 
times, has a high piobability (larger than a specified limit) of variable velocity and should 
be announced as an object worthy of further study. The second problem is to find the 
percentage of variable velocity stars among a large list of stars with several radial velocity 
measures for each star From the distribution of standard errors only the percentage of 
cases where the velocity variation exceeds a certain limit can be ascertained The third 
problem is concerned with those stars for which a binary orbit haB been determined. The 
statistical distribution of these binary systems according to mean distance between the two 
stars and the ratio of their masses can be evaluated within certain limits. 



NEWS AND NOTICES 

Readeis aie invited to submit to the Secretary of the Institute news items of interest 

Personal Items 

Mr. Kenneth J. Arrow has been appointed Research Associate of the Cowles 
Commission 

Dr. W. D. Baten, formerly of Michigan State College, is now Chief, Opera¬ 
tions Branch, Planning Section, Air Defense Command, Mitchel Field, New 
Yoik. 

Dr Paul T. Bruyere is now Chief of the Medical Records and Statistics Branch,. 
Army Institute of Pathology, Office of the Surgeon General, War Department, 

Dr A. C. Cohen received his discharge from the Army, with the rank of Lieu¬ 
tenant Colonel, at the beginning of the spring quarter, and returned to his former 
position at Michigan State College. He has accepted a position at the Uni¬ 
versity of Georgia beginning with the 1947 summer session there. 

Dr Hallett H. Germond has resigned from his position as professor of mathe¬ 
matics at the University of Florida. He is now Director of Research for the 
S W Marshall firm of Consulting Engineers, in New York City. 

Dr Meyer A Girshick, formerly with the Department of Agriculture, is now 
with the Douglas Aircraft Company in Santa Monica, California. 

Dr Clyde PI Graves has accepted a position as Operations Analyst, Opera¬ 
tions Analysis, Air Defense Command, Mitchell Field, New York. 

Dr E J. Gumbel has been appointed to an Associate Professorship at Brook¬ 
lyn College 

Dr Trygve Haavelmo has returned to Norway, and is at the University 
Institute of Economics, Oslo 

Mr. Joseph 0. Harrison, Jr , is now employed as a Mathematician m the 
Computing Branch of the Ballistic Research Laboratories, Aberdeen Proving 
Ground 

Dr Wassily Hoeffdmg has accepted a psoition as Research Associate, The 
Institute of Statistics, University of North Carolina, Chapel Hill. 

Mr, Cyrus A Martin is now an administrative analyst and statistician, as¬ 
sisting Chief of Personnel Control of Signal Corps, in Washington, D C 

Mr. Jack I Northam has accepted an Assistant Professorship in the Depart¬ 
ment of Mathematics, Kansas State College, Manhattan, beginning with the 
1947 summer session. 

Professor Henry Scheff6, who has been on leave for the past year, returned to 
his position in the Engineering Department, University of California at Los 
Angeles, in June. 

Mr Edward M. Schrock has accepted a position as Quality Control Engineer 
with the General Electric Company at their Erie Works, Erie, Pa. 

Mr. Jerome R. Steen, who has been manager of Quality Control Engineering 
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with the Sylvania Electric Products in Emporium, Pa., has now transferred with 
the same company to Flushing, New York. 


Professor Emeritus Irving Fisher, of Yale University, died April 29, 1947, 
at the age of eighty. 


In connection with the Atlantic City meeting of the American Chemical 
Society, April 14-18, 1947, a symposium on Statistical Methods in Experimental 
and Industrial Chemistry was held, in which several members of the Institute 
of Mathematical Statistics took part. The following program was presented 
Tuesday morning and afternoon, April 15: 

(1) Introductory Remarks. B L. Clarke 

(2) The Management Viewpoint. George Smith. 

(3) A New Technique for Testing the Accuracy of Analytical Data. W. J. 

Youden. 

Discussion: Grant Wernimont, R, F. Moran, John Mandel, and Roland 
H. Noel. 

(4) Design of Experiments m Industrial Research. Hugh M. Smallwood. 

(5) Statistical Training for Industry. Samuel S Wilks 

Discussion: John Tukey, E. V. Lewis, Churchill Eisenhart, and C. West 
Churchman. 


Preliminary Actuarial Examinations 
Prize Awards 

The winners of the prize awards offered by the Actuarial Society of America 
and the American Institute of Actuaries to the nine undergraduates ranking 
highest in the combined score on Part 1 and Part‘2 of the 1947 Preliminary 
Actuarial Examinations are as follows: 


First Prize of $S00 

James H. Chung . , ... University of Toronto 

Additional Prizes of $100 

James F A, Biggs . . Yale University 

George Y. Clierlin. Rutgers University 

Frank II. David. . . Harvard University 

Thomas M. Galt. . . University of Manitoba 

Charles F. Pinzka,., . . ,,. . Rutgers University 

Philip C Rapp . . . University of Buffalo 

Morton K Schwartz . . Brown University 

James G C. Templeton ,. . . . . ... University of Toronto 


The two actuarial organizations have authorized a similar set of nine prize 
awards for the 1948 Examinations 
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Tile Preliminary Actuarial Examinations consist of the following three examina¬ 
tions : 

Part 1 Language Aptitude Examination 

(Reading comprehension, meaning of words and word relationships, antonyms, 
and verbal reasoning) 

Part 2. General Mathematics Examination 

(Algebra, trigonometry, coordinate geometry, differential and integral 
calculus) 

Part 3 Special Mathematics Examination 

(Finite differences, probability and statistics) 

The 1948 Examinations will be administered by the College Entrance Examina¬ 
tion Boaid at centers throughout the United States and Canada on May 14-15, 
1948. 


Correction 

In the Directory of Members published in Vol XVII, No. 4 (December 1946) 
Professor Joseph Kampd de Fenet’s name is listed m the F’s under Fenet. It 
should have appeared in the IC’s, under Kamp6 de Feriet. 


New Members 

The following persons have been elected to membership m the Institute (March 1 to May 30, 

1947) 

Adams, Joe K. Ph M. (Wisconsin) Graduate student and half-time instructor in Psy¬ 
chology, Graduate College, Piinceton University, Princeton, N J 

Adams, Walter B. Communications Analyst, Civil Aeronuatics Admin., Dept, of Com¬ 
merce, 8253 S Ingleside Ave , Chicago 19, III 

Aitken, Alexander C. D Sc (Edinburgh) Piofessor of Mathematics, University of Edin¬ 
burgh, 28 Stirling Road, Edinburgh 5, Scotland 

Brambilla, Francesco Ph.D (Univ L Bocconi) Lecturer in Math Statistics, Institute 
of Statistics, University L Boccom, 6 via Panzacchi, Milano, Italy 

Brown, George Middleton, D Sc. (Michigan) Asst Prof of Math, Midi State College, East 
Lansing, Mich , 633 Cherry Lane 

Bueno, Luiz de Freitas, E E (Mackenzie Coll ) Professor da Umversidade de Sao Paulo, 
Brazil, Rua Itambe 84-1, Casa 13 

Burke, Cletus J., M A. (U.C.L A ) Res Ass’t, Umv of Iowa, Iowa City, Iowa, 118 River¬ 
side Park 

Cameron, Joseph M., M S. (N Car State) Room 302 South Building, National Bureau of 
Standards, Washington, D C. 

Carpenter, Osmer M S (Iowa State) Instructor, Mathematics Department, Iowa State 
College, Ames, Iowa 

Castellani, Maria D Sc. (Rome) Visiting Professor, Department of Mathematics, Uni¬ 
versity of Kansas City, Kansas City 4, Mo 

Chernoff, Herman Sc M, (Brown) National Research Council Pre-Doctoral Fellow, 8003 
Wallace Ave , Bronx 67, N. Y 

Clark, Stanley M.Ed (Saskatchewan) Student and teaching assistant, 1S01-71H Si , S,E , 
Minneapolis H, Minn 
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Cover, John H, I’h.D, (Columbia) Director, Bureau of Business and Economic Res., 
Uruv ol Maiyland, College Park, Md. 

Dailey, John T. M S (N Texas Teachers Coll ) Ros Psychologist .(Aviation), Psycho¬ 
logical Res and Examining Unit, Sqn E, Indoc Lunation Div., Air Training Command, 
San Antomo, Texas 

Darling, Donald A. PhD (Calif Inst Tech.) Teaching Ass't, Calif Inst, of Technology, 
Pasadena 4, Calif. (As of July 1947, Dept, of Math., Cornell Umv , Ithaca, N. Y ) 
Darmois, Georges D.Sc. (Palis) Prof it la Facultd des Sciences de Pans, 7 Rue de I'Odeon, 
Pans G, Prance 

Davies, J. Alfred M.A (Alabama) Statistician, Design Eng. Section, General Electric 
Co , 708 Hill Avenue, Owncsboro, Kentucky 

Dunnett, Charles W. M A. (Toionto) Student, 1044 John Jay Hall, Columbia Umv., 
New York 27, N Y, 

Egermayer, Frantisek Sc D. (Charles Umv , Prague) Chief of Section, State Statistical 
Office, 2 BZlskeho, Prague VII, Czechoslovakia. 

Flckenscher, Edgar H. A B. (Calif ) Graduate student and teaching ass’t, Umv. of Calif., 
1/fSO Acton St , Berkeley 2, Calif. 

Fraga, Constantino G. Jr. (Sao Paulo) Head, Dept, of Statistics, Instituto Agronomico, 
Campinas (S.P ), Brazil 

Frank, Elmore J. B.A. (Chicago) Instr. in Statistics, Ill. Institute of Tech., and Statisti¬ 
cian, Commercial Res. Dept., Amour and Co., SJflS Maryland Ave , Chicago 1G, III. 
Frisch, Ragnar Ph.D (Oslo) Professor, University Institute of Economics, Oslo, Norway 
Geary, Robert C. D.Sc Superintending Officer, Statistics Branch, Dopt. of Industry and 
and Commeice, 27 Lecson Park, Dublin, Ireland. 

Goodman, John R. M.S. (Iowa SLato) Hoad, Sampling Section, Suivey Res. Center, 
Umv ol Mich,, Ann Arboi, Mich. 

Gutman, Pierre M A (Columbia) Student, 7 Mountain Ave., Maplewood, N. J. 

Hartline, H. K. M D. (Johns Hopluns) Assoc Prof, of Biophysics, Johnson Res Founda¬ 
tion, Univ. of Pennsylvania, 36th and Spruce Sts , Philadelphia, Pa 
Hartog, Jacob A (Rotterdam) Rockefeller Fellow, 25 Fallen St , Cambridge, Mass 
Jacobs, Marcus A.B (Penn ) Health Statisticisn, J^S9 S SGth St., Arlington, Va 
Jeeves, Terry A. A.B (Calif.) Teaching ass’t in math , Umv. of Calif , 2511 Hearst Ave., 
Berkeley 9, Calif. 

Kempthorne, Oscar M A (Cambridge, England) Res Assoc. Prof , Statistical Lab., Iowa 
Stale College, Ames, Iowa 

Kendall, David G. M A. (Oxford) Fellow, Magdalen Coll., Oxford, England 
Kent, Leonard MBA (Chicago) Instr. vn Statistics, School of Business, Univ. of Chic¬ 
ago, Chicago 37, Ill, 

Kupperman, Morton B S (C.C NY) Statistician, Office of the Surgeon General, War 
Dept., 2829-27th Si, N.W , Washington 8, D C. 

Lhati, Elizabeth L. M.A. (Michigan) Statistician, Bur. of Measurement and Guidance, 
Carnegie Institute of Technology, Pittsburgh 13, Pa. 

Levine, Harry D, B.S (Chicago) Instr , Long Island Umv., 164- W. 96 St , New York 25, 
NY. 

Lichtenstein, Morris BA (Michigan) Statistician, 48il’ AT. Capitol St., N.B., Washington 

11, D C 

McMillan, Brockway Ph D, (Mass Inst. Tech.) Member, Technical Staff, Bell Tele¬ 
phone Labs , Murray Hill, N. J 

Marshall, Andrew W. Student, 5757 University Ave., Chicago 37, III 
Metzner, Charles A. Ph D, (Wisconsin) Study Director, Survey Research Center, Umv 
of Michigan, Ann Arbor, Mich. 

Norton, John W. B S (California) Lab. Supervisor, Union Oil Co. of Calif , 5529 Mac¬ 
donald Aw , Richmond, Calif 
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Otter, Richard Pli D (Indiana) Instiuetoi, Fine Hal], Princeton Umv , Princeton, N J. 
Passos, Helena Rocha Penteado Dire tor do Divisao do Depto Estadual de Eslatistica de 
Sao Paulo, Avemda Angelica, 160, Apio 6, Sao Paulo, Biazil 
Priest, Edward I. B S. (Columbia) Student in. mathematics, 1204- E 55th St,, Chicago, III 
Quensel, Carl-Erik Fil Dr (Lund) Prof, at the University, Lund, Sweden, Lmnegalan 

14 

Rankin, Mozelle M A. (Ohio State) Ass’t Instructor, Ohio State Univ , 107-14-lh Ave., 
Columbus 1, Ohio 

Robb, Richard A. D Sc (Glasgow) Mathematics Lecturer and Mitchell Lecturer in 
Statistics, Univ. of Glasglow, Glasgow, W 2, Scotland 
Ruist, Erik Fil lcand (Stockholm) Amanucns, Industriens utrcdnmgsiiistitut, Stock¬ 
holm 16, Sweden 

Sham, Inder M. M A (Punjab) Rothamsted Experimental Station, Harpendon, Herts, 
England 

Schneider, B. Aubrey Sc D (Johns Hopkins) Ass’t Director, Dept of Statistics and 
Special Sei vices, American Cancer Society, 47 Beaver St , New Yoik 4, N Y 
Seitz, Jin PhD. (Plague) Koufimskd 8, CSR, Piaha XII, Czechoslovankia 
Simaika, Jacques B. PhD (London) Lecturer, Faculty of Science, Fuad I University, 
Abbassia, Cairo, Egypt 

Slatin, Benjamin M A. (Columbia) Jr Analyst, Econometric Institute, 179 Peshme 
Ave , Newatk 8, N. J * 

Suydam, Bergen R. A B (N.Y State Coll for Teachers) Graduate student, Columbia 
University, 1 W 706 St , Shanks Village, Orangeburg , N Y 
Tashmuhamed, Sarymsakov Ph D (Moscow) President of the Academy of Sciences of 
Uzb.SSR, Professor of the University, Tashkend, ul Abdulli Tukaeva I, Tashkent, 
USSR 

Travers, Robert M. W. Ph.D (Columbia) Examiner, and Assoc Prof of Education, 
Bureau of Psychological Services, Univ of Mich , Ann Arbor, Mich 
Weiner, Sidney B S. (C.C N Y ) Student, New York University Graduate School, 1539 
East 17th St , Brooklyn 30, N Y 

Wezelman, Sol M. A B (Michigan) Graduate student, University of Michigan, Ann 
Arbor, Mich , Burt St., Omaha, Nebr 

Wishart, John D.Sc (London) Header in Statistics, School of Agriculture, Cambridge, 
England 



REPORT ON THE NEW YORK MEETING OF THE INSTITUTE 

The Twenty-Sixth Meeting of the Institute of Mathematical Statistics was 
held in New York City on Thursday, April 24, and Friday, April 25, 1947, and 
was co-sponsorcd by the Amoiican Mathematical Society. This meeting was 
devoted to a program on Stochastic Processes and Noise. The attendance of 
190 persons included the following 75 members of the Institute: 

F A, Acton, C.B Allcndoerfer, F L Alt,T W. Anderson, Jr , L. A Aroian, W D.Bateni 
Robert Bechhofer, J H Bigelow, D. II Blackwell, Paul Boaclian, G W, Brown, R S. Bur- 
ington, B. H Camp, E W Cannon, A G Carlton, K. L. Chung, P C. Clifford, D D Cody, 
Ilarald Ciamdr, Id B Curry, J. II Curtiss, R L Dietzold, J L Doob, Jacques Dutka, 
Churchill Eiscnhart, Benjamin Epstein, Will Feller, M M. Flood, Bernard Friedman, C P. 
Gersohenson, H Ii Goode, C H Graves, E J Gumbel, T E, Harris, Millard Hastay, L II. 
Hcrbach, I 1 G. Hoel, Mark Kac, R D Keeney, T C. Koopmans, William Kruskal, Jack 
Laderman, J. E. Lieberman, 8. B Littauer, Melitta Lowy, P J. McCarthy, Brockway Mc¬ 
Millan, Frederick Mosteller, L F Nanm, P M Neurath, G. E Noether, M L Norden, C. 
0. Oakley, P. S Olmstead, G B. Price, J S. Rhodes, John Riordan, Selby Robinson, Frank 
Saidel, Arthur Sard, F E Satterthwaite, G R Seth, C E. Shannon, Jack Sherman, W. A 
Showhart, Rosedith Sitgieavcs, Andrew Sobczyk, Milton Sobel, Emma Spaney, C. M 
Stein, J. W. Tukey, D. F Votaw, Jr.,B T Weber, S. S Wilks, Jacob Wolfowitz 

The first session, was held on Thursday morning, with Professor Carl Al- 
lendocrfer of Haverford College serving as chairman. The following program 
was presented: 

Stochastic Processes— 

Descnption , Professor J L Doob, Columbia University 

Estimation, Professor Will Feller, Cornell University 

Prediction, Piofessor N. Wiener, Massachusetts Institute of Technology 

Tins meeting was concluded with a discussion by Dr. H. W. Bode, Bell Telephone 
Laboratories, Professor Mark Kac, Cornell University, and Professor A. Wald, 
Columbia University. 

Dr. S. 0. Rice, Bell Telephone Laboratories, was chairman of the Thursday 
afternoon session. The following program was presented: 

Stochastic Processes in Some Applications — 

In Economics, Dr T Koopmans, Cowles Commission 
In Insui ance, Professoi I-I Cram4i, Yale University 
In Cosmic Radiation, Professor N Arley, Punoeton University 
In Nuclear Physics, Dr. S M. Ulam, Los Alamos Laboratory 

The final session was held on Friday morning with Professor J. W, Tukey 
of Princeton University as chairman. The program was as follows: 

Different Ways of Describing Noise — 

By a Noise Spectrum, Dr, C, E Shannon, Bell Telephone Laboratories 
By d Single Function, Mr J E Bigelow, Institute for Advanced Study 
By Many Functions, Professor Mark Kac, Cornell University 
Round Table on Interrelations, Messrs Shannon, Bieglow, Kac, and Rice 

P.S DWYER, 
Secretary. 
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REPORT ON THE APRIL MEETING OF THE INSTITUTE IN 

ATLANTIC CITY 

The Twenty-Seventh Meeting of the Institute of Mathematical Statistics was 
held in cooperation with the Eastern Psychological Association, on Saturday 
morning, April 26, 1947, in Atlantic City. This meeting was a Round Table 
on Certain Recent Statistical Developments, and its attendance of approximately 
100 persons included the following 9 members of the Institute: 

F S Acton, J. W Dunlap, Benjamin Epstein, Irving Lorge, P J. McCarthy, Frederick 
Mosteller, P J. Rulon, F E Satterthwaite, and Emma Spaney 

Professor Bernard Riess of Hunter College was chairman of the meeting. 
The following program was presented: 

Papers Sequential Analysis 

Dr living Lorge, Teachers College, Columbia University 
Staircase Methods 

Dr. Philip J McCarthy, Cornell University 
Inefficient Statistics. 

Dr Frederick Mosteller, Harvard University 

Discussion. Dr. Jack W Dunlap, Psychological Corporation 

Dr Leon Festinger, Massachusetts Institute of Technology 
Dr. William E Kappauf, Punceton University 
Dr Joseph Zubin, New York Psychiatric Institute Hospital 

P S DWYER, 
Secretary. 
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REPORT ON THE SAN DIEGO MEETING OF THE INSTITUTE 


The first Western Regional meeting of the Institute of Mathematical Statistics 
was held in San Diego, California, June 17-19, 1947, jointly with the American 
Association for the Advancement of Science. The meeting u as attended by 53 
persons, including the following 31 members of the Institute: 

G A Balccr, Joseph Borkson, Z, W. Bhnbaum, II C. Carver, Harald Cramdr, J R 
Crawford, .Dorothy Cruden, W. J Dixon, Robeit Dorfman, M W Eudey, Evelyn Fix, 
JohnGurland, J L Hodges, Ji , J. M. Howell, H M. Hughes, E S Keeping, E, L. Lehmann, 
R, H Lien, F. J Massey, G F McEwen, Fiedeiiok Mosteller, S W Nash, Jerzy Neyman, 
Kathryn B. Rolfe, Hemy Seheffd, Herbert Solomon, C M Stein, Zenon Szatrowski, H M. 
Walker, J. D. Williams, Zivia S. Wurtele 

The afternoon session on June 17 was a joint meeting with the Group of 
Former Operations Analysts The following program was presented under the 
chairmanship of Col. Roscoe C Wilson 

Topic Statistical Problems in Operations Analysis. 

Papers’ Engineering and Statistics at the Pacific From m World War II 

Roger Wilkinson, Bel! Telephone Laboratories, New York City. 

Present Organization and Activities of Operations Analysis 

Leroy A. Brothers, Operations Analysis, AsBt. Chief of Air Staff-3, Washington, 

D. C. 

Statistical Evidence of Bomb Release Malfunctions. 

Mark W. Eudey, University of California, Berkeley 

Study of Effectiveness of Certain Bombs Used Against German Industrial Targets. 
J Neyman, University of California, Berkeley 

The morning session on June 18 was presented with Professor Alva R Davis 

as chairman, and the program was as follows: 

0 

Topic: Statistical Problems m Biology. 

Papers 1 A Mathematical Model of the Relation between While and Yolk Weights of Birds’ 
Eggs 

G. A Baker, University of California, Davis. 

Statistical Analysis for a New Procedure in Sensitivity Experiments. 

W. J Dixon, University of Oiegon, and A. M Mood, Iowa State College. 

The Relation of Inbreeding to Calf Mortality 
P. W. Gregory, University of California, Davis. 

Cooperative Field Trials. 

P. A Minges, University of California, Davis. 

Population Genetics. 

N. II Horowitz, California Institute of Technology. 

Statistical Problems in Assessing Methods of Medical Diagnosis, with Particular 
Reference to X-Ray Technique 

J Yerushalmy, United States Public Health Service, Washington, D, C. 
Discussion. J. Neyman, University of California, Berkeley 

' Professor John W. Miles was chairman of the afternoon session on June 18, 
which was a joint session with the California Section of the American Society 
for Quality Control. The following papers were presented: 
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Topic Industrial Applications of Statistics 
Papers Opei atmg Chat acterishcs of Average and Range Chat ts 
Henry ScheffA Umvcisity of California, Los Angeles 
Sampling Inspection by Variables 
Herbert Solomon., Stanford University. 

Some Exact Numerical Results for Sequential Acceptance Sampling by Attributes, 
Mark W Eudey, University of California, Berkeley 
Choice of Inspection Stnngence m Acceptance Sampling by Attributes 
Joseph L Hodges, Umveisity of California, Berkeley. 

Widening Tolerances for Closer Fitting Parts 

Edmond E Bates, Quality Engineering Consultants, Los Angeles 
Discussion. Russell O’Neill, University of California, Los Angeles. 
Re-establishing Operator Responsibility for Quality Control, 

Wyatt H Lewis, Geneial Electric Company, Ontario, California 
Discussion William B Rice, Plomb Tool Company, Los Angeles. 

The Application of Learning Curves to Industrial Planning 
James R Crawford, Wnght Field, Dayton, Ohio 

The Wednesday evening session was under the chairmanship of Professor 
George Beadle, California Institute of Technology, with the following program: 

Topic, Statistical Problems in Genehcal Studies m Chickens 

Paper • Rate of Genetic Gam in Egg Production m Progeny-tested Flocks as a Function of 
the Interval between Generations 

Everett R. Dempster and I Michael Lerner, University of California, Berkeley* 

On Thursday morning, June 19, there was a joint session with the Western 
Psychological Association Professor Helen Walker of Columbia University was 
chairman The program was as follows: 

Topic’ Statistical Problems m Psychology. 

Papers Statistical Criteria of the Effectiveness of Selective Procedures 
R. F Jarrett, University of California, Berkeley 
Unsolved Statistical Problems Arising in Psychological Measurements 
Helen Walker, Columbia Univeisity 

Cost Utility Curves as a Means of Assessing Batteries of Tests. 

Joseph Berkson, Mayo Clinic 

Approaches to Umvocal Factor Scores 

J. P. Guilford, University of Southern California. 

The afternoon session on June 19 was under the chairmanship of Professor 
Harald Cram6r of Stockholm, Sweden, and offered the following program 

Topic: Theory of Statistics and its Applications to Astronomy 

Papers Random Variables with Compai able Peakedness 
Z W Birnbaum, University of Washington. 

Distributions which Lead to Regressions Representable by Polynomials. 

Evelyn Fix, University of California, Beikeley 
Optimum Tests of Composite Hypotheses with One Constraint 
Erich L Lehmann, University of California, Berkeley 
Estimation of a Distribution Function by Confidence Limits. 

Frank J. Massey, Jr., University of California, Berkeley 
A Note on Sequential Confidence Sets , 

Charles Stein, Columbia University 
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Certain Types of Statistical Problems in Astronomy 
Robert, ,T Trumpler, University of California, Berkeley 

Basic Concepts of the Theory of Statistics m Relation to Certain Problems of 
Astronomy. 

J. Neyman, University of California, Berkeley 

A Note on the Problem of Binai y Stars 

Elizabeth L, Scott, University of California, Berkeley 

Explicit Solution of the Problem of Fitting a Straight Line when both Variables 
are Subject to Error for the Case of Unequal Weights (By title) 

Elizabeth L, Scott, Umveisity of California, Berkeley. 

Power Function of the Analysis of Variance and Covariance Test of a Normal 
Bivariate Population. (By title) 

Way Ming Chen, University of California, Berkeley 
Unbiased Estimates with Minimum Variance, (By title) 

Charles Stein, Columbia University. 

Sufficient Statistics and a System of Partial Differential Equations. (By title) 
Erich L Lehmann, University of California, Berkeley, and Henry Soheff6, 
University of California, Los Angeles. 

On Wednesday evening, June 18, at 6 o’clock, there was a dinner for members 
and guests, at the Hotel San Diego. 


P S. DWYER 
Secretary 
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ON OPTIMUM TESTS OF COMPOSITE HYPOTHESES WITH ONE 

CONSTRAINT 1 

By E L. Lehmann 
University of California, Berkeley 

Summary. This paper is concerned with optimum tests of certain composite 
hypotheses. In section 2 various aspects of a theorem of Scheff4 concerning type 
Bi tests are discussed. It is pointed out that the theorem can be extended to 
cover uniformly most powerful tests against a one-sided set of alternatives. 
It is also shown that the method for determining explicitly the optimum test 
region may in certain cases be reduced to a simple formal procedure. These 
results are used in section 3 to obtain optimum tests for the composite hypothesis 
specifying the value of the circular serial correlation coefficient in a normal 
distribution. A surprising feature of this example is the fact that for the simple 
hypothesis obtained by specifying values for the nuisance parameters no test 
with the corresponding optimum properties exists. 

In section 4 the totality of similar regions is obtained for a large class of prob¬ 
ability laws which admit a sufficient statistic. Some composite hypotheses 
concerning exponential and rectangular distributions are treated in section 5. 
It is proved that the likelihood ratio tests of these hypotheses have various op¬ 
timum properties. 

1. Introduction. In developing tests for a class of hypotheses three phases 
may be distinguished First, tests are obtained which are intuitively appealing; 
next, it is shown that these tests have certain attractive features, finally, it is 
proved that they are “best possible” tests 

In dealing with parametric hypotheses, the likelihood ratio principle is fre¬ 
quently used to obtain a reasonable test. For many of the tests so derived for 
normal and exponential distributions, the question of bias has been investigated. 
In most cases unbiasedness has been established; in the other cases, usually a test 
based on the same criterion but with the boundaries shifted, can be proved to be 
unbiased. Other desirable properties which likelihood ratio tests have been 
shown to possess, relate to the asymptotic behaviour of these tests as the sample 
sizes tend to infinity. An interesting problem which does not seem to have been 
treated is the question of admissibility of likelihood ratio tests, a test being ad¬ 
missible if its power can not be improved upon uniformly by any other test of 
the same level of significance. 

Investigations of optimum tests of composite hypotheses have been carried 
through for many hypotheses concerning normal distributions When the hy¬ 
pothesis specifies the value of one parameter (hypothesis with one constraint), 
unif ormly most powerful one-sided and type B± (uniformly most powerful un- 

1 Presented at a meeting of the Institute of Mathematical Statistics m San Diego, June, 
1947 
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biased) tests have been obtained. When the number of constraints is larger 
than one, not so much can be expected. It has been shown for some of the tests 
in this class that they have maximum average power uniformly over a family of 
surfaces in the parameter space, or that they are uniformly most powerful with 
respect to the subclass of tests whose power depends only on some function of 
the parameters. (All optimum properties mentioned are relative only to the 
class of all similar regions. This will bo so throughout the paper and will usually 
not be stated explicitly). 

Two methods for finding uniformly most powerful or uniformly most powerful 
one-sided regions and type B x tests, if they exist are known. Neyman and Pear¬ 
son [1] developed a method for determining all similar regions, and applied it 
to obtain uniformly most powerful one-sided tests of certain hypotheses. Ney¬ 
man [2, 3] extended the method to obtain, for certain hypotheses, the class of all 
bisimilar (unbiased similar) regions, and Scheffd [4], developing the method 
further, proved the existence of type B x tests for an important class of hypotheses. 

A different method for obtaining all similar and bisimilar regions was devised 
by P. L. Hsu and was used by him and other writers to prove various optimum 
properties of the likelihood ratio tests for the general linear hypothesis, of Hotel¬ 
ling’s T 1 and of other tests [5, 6, 7, 8]. 

In the present paper we are concerned with applications of these two methods 
to composite hypotheses with one constraint. However, the applicability is not 
so restricted. In fact, the second method has been used mainly in connection 
with composite hypotheses with many constraints, and the author believes it to 
be suitable also for deriving optimum classification procedures. An essential 
restriction of both methods seems to be that a set of sufficient statistics must exist 
with respect to the parameters involved: with respect to the nuisance parameters 
so that all similar regions can be found, with respect to the parameters specified 
by the hypothesis so that there exists a best of all similar regions. 

Extensions of the existing theory based on the first method are obtained in 
section 2, and the theory is applied in section 3 to a hypothesis concerning a mul¬ 
tivariate normal distribution. Sections 4 and 5 are concerned with applications 
of the second method to problems to most of which the earlier method is not 
applicable, in particular to hypotheses concerning exponential and rectangular 
distributions, hitherto only treated from the likelihood ratio point of view. 

2. On the theory of optimum tests. 

2.1 One-sided tests. In an interesting paper [4], Scheffd determined the type 
B and type B x tests of a certain class of composite hypotheses specifying the 
value 0o of a parameter 0 in the presence of nuisance parameters. 

Scheff4’s results can, in an obvious way, be extended to cover one-sided sets 
of alternatives To show this, consider the method used in [4]. Under certain 
assumptions all tests 2 are found which satisfy the two conditions: 

2 The terms "the test w" and "the region [of rejection] w " will be used interchangeably. 
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(a) The power function /3 U at 0o has a preassigned value e (the level of signifi¬ 
cance), independent of the nuisance parameters; 

(b) the power function at 8 0 has derivative 0. (Condition of unbiasedness). 
Then that test Wo is determined for which, of all those satisfying (a) and (b), 

(c) the second derivative at 6 $, (^((Jo), is as large as possible. 

By definition is a type B test Under a certain additional assumption (this 
. d 2 a 

is the convexity assumption > 0 of Scheff^s Theorem 2) it is shown that of all 

tests satisfying (a) and (b), w a has maximum power against all alternatives, 
i.e. is of type Bi. 

If now we want to maximize the power against only the one-sided set of alter¬ 
natives, 9 > 0 O , we determine that test w\ of all those satisfying (a), for which 

(d) the first derivative at 0o, pL(.6 a ), is as large as possible. 

Under a certain additional assumption (m Scheffe’s notation this would be the 

. . 

monotomcity assumption — > 0) it can then be shown that of all tests satisfy¬ 
ing (a), wi has maximum power against all alternatives 6 > 6 0 , (it also has 
minimum power against all alternatives 0 < 0o), i.e u>i is uniformly most power¬ 
ful against alternatives 0 > 0 O . We shall not carry through the discussion 
in detail since Scheffd’s argument applies step by step, with only the obvious 
changes. 

2 2 Determination of the boundaries. Let X) , • • , X n be n random variables 
with a joint probability density function p, depending on parameters 0i and 0 = 
(02, • • , Oi) We shall denote the probability density function of a set of ran¬ 
dom variables X x , • ■ • , X n whose distribution depends on a parameter 6 by 
p(x i, ■ • • , x n | 0) or simply by p{xi , ■ • • , x 7 .) when the dependence on 6 is 
clear from the context. The set of points (xi, ■ ■ • , x„) for which 


P(x i, ' 

is positive we shall denote by W + (0), 
Let 

( 2 . 1 ) tp,(xx, 


0 ) 


,*»)=— log p{Xx, 


01 ) 0) I®!—' 1° I (* — 1> 


,1), 


and let the random variable 4>, be defined by 

(2.2) 4>, = - , X n ). 

Then for testing the hypothesis H: 9\ = d\. under the assumptions stated by 
SchefK, the type Bi test wo is defined by the inequalities 

(2.3) <pi < hi , > h ■ {h < h) 

where fci, depend on 0i, 0, <pi , • • • , <pi and are determined by the two equa¬ 
tions 3 

r fc 2 r 00 

(2.4) / - ,<pi) d<pi = (1 — e) / same (s - 0, 1) 

Jhx ' ' *>-«> 


8 Although, fa and fa may depend on 8, wa is independent of 0, as was shown m [4]. 
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The equations (2.3) and (2.4) are not suitable for the determination of the 
boundary of w 0 . The variables have to be transformed so as to obtain for 
Wo an expression from which the calculation of the boundaries becomes feasible, 
(cf. [9]). This part of the work may be formalized in the following theorem. 
Theorem 1. Let 

U = f ($!,$*,•••, 9,) 

(2.5) 

Vi = gi(9 2 , ■■■ ,<h), (i - 2, ••• , l), 

be a system of functions, continuously differentiable and with non-vanishing Jaco¬ 
bian almost everywhere, and such that 

(i) 17 is a linear function of <$i 

(2.6) U = + b 

with coefficients which may depend on $ 2 , • ■ • , $1 a,nd such that 4 a($j, • • • , $j) > 0> 

(ii) it is possible to solve for , ■ • • , in terms of the 7’s, 

(hi) under the hypothesis H, U is distributed independently of 

7= (V 2 , • • • , Vi). 


Then the region w 0 is equivalent to the region 

(2.7) u < ci,> d (ci < cf) 
where Ci, c 2 are determined by 

r c 2 f * 

(2.8) J u‘p(u ) du = (1 - e ) J u'p{u ) du (s = 0, 1). 


Proof. 


(29) 


p{<pi, <pi, •• • ,<pi) = ?(w, Da, • • • ,Vi) 


d(u, Vi, •••!>;) 

5(vi> • • ■ ,vi) 


= p(u)-p(W2, 


. du 


d(i>2, • •• , «h) 
5(^2, •■•,«) 


But 

( 2 . 10 ) 


•U = o((?2 , ' ' ’ , ¥>l) ' Pi + &(P2 , • • • ,V>l) 


= a(vt, • • • , vi) • <pi + p(vt, • • • 1 Vi) 


so that (2.4) reduces to 

\ v{u)p(v2! ... }Vi)du 

= (1 — e ) f same (s = 0, 1) 

J— CO 


( 2 . 11 ) 


4 A similar theorem holds when we assume o($ 2 < 0. 
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and hence to 

(2 12 ) 


l 




»Vl) 


u p (u) du 


(1 — e) f same 


(« = 0 , 1 ) 


which shows ci and c 2 to be independent of the v’s. Also obviously (2 3) trans¬ 
forms into (2 7) which completes the proof. 

If U is such that its distribution (when Ox = 0?) is independent of 9, c x and c 2 
of theorem 1 will depend only on the data of the problem - e, n, 6\. However, the 
existence of constants ci and C 2 satisfying (2 8) still has to be proved. We may 
show more generally the existence of 7ci and k 2 satisfying (2 4). A proof is im¬ 
mediately supplied by an argument which was used by Neyman [10] and Wald 
[11] to prove the existence of type A tests, and which may be stated in the 
following 

Lemma. Let 0 < a < 1, letf(x) > 0 and / x‘f(x) dx < for s = 0,1. Then 

J— oo 

there exist A, B such that 


(2.13) 


[ x‘ f(x) dx 


cl / x* f(x) dx 


(fi = 0 , 1 ). 


3. Testing for circular serial correlation in a normal population. We now 
apply the results of the previous section to obtain the optimum tests (i.e uni¬ 
formly most powerful against the one-sided set of alternatives, type Bi in the 
two-sided case) for the hypothesis specifying the value of the circular serial cor¬ 
relation coefficient in the normal population considered by Dixon [12]. (For 
the literature on testing for non-circular serial correlation in normal populations 
cf. [12]). 

We assume 

1 _ fln r J n , “I 

(3 1) p(?c i j ” * j xT) = ^ n 2cr z ^ ^(r.+i £) 1 J 

where x n +i = and [ <5 [ < 1, and we test the hypothesis 5 = 5 0 . For testing 
purposes only the value S 0 = 0 is of interest presumably, however, the family of 
tests for arbitrary 5 0 is required for estimating 5 by means of confidence intervals, 
and therefore the more general hypothesis is considered. 

Making a transformation in one of the parameters we write 

p(xi , • • • , x n ) 

(3 ' 2) = 0(5, a) expj^ a[(l + S 2 )jz (Xi - Z)* ~ 25 g (*. - £)(**« - £)]] 

where in the notation of the previous section 0i = 5, 0 2 = a, 0% = £. 

Theorem 2. For testing the hypothesis S — 5 0 for the distribution (3.2) 

(a) the type B i test exists and is given by 

(3.3) 


r < n , > r a 
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where 

V 

2 (®. — »)(*l+l - X) 

(3.4) r = - 

£ (*. — £) 2 

i=l 

and where r x and r x are determined by 

<“> £’ ( t + f! r -_ a,,) r W * - a - <> /. ('-0.D- 

(b) the uniformly most powerful similar region for testing II against the alter¬ 
natives S > So exists and is given by 

(3.6) r > r' 

where r' is determined by 


1 71 

x = - £ Xi 


n ,-i 


(3.7) 


f p(r) dr = (1 — e) [ p(r) dr. 1 
J— go OO 


Pkoof. We compute 

<Pi = Ci(S 0 , a) + 2a[5oS(x< — £) 2 — 2 (x x — £)(a\+i — £)] 

(3.8) <p 2 = C 2 (S 0 , a) + (1 + «o)2(*< - £)* - 2S 0 2(x, - £)(*,+i - £) 

ips — — 2na(l — 5o)(£ — £)• 

There is no difficulty in checking the conditions of Schefffi’s theorems [4]. 

Next we apply Theorem 1 of the previous section, and define 

F 2 = (1 + 5 S 0 )2(X, - X) 2 - 25 0 2(X, ~ X)(X i+l - X) 

(3.9) Vo = X - £ 

TT gCgj ~ X)(X HJ - 1) 

v 2 

Conditions (i) and (ii) of Theorem 1 are easily seen to be satisfied. To show that 
U is independent of V = {Vi , Vj) we employ arguments which have recently 
been used by various authors in a number of similar problems (cf. [13,14,15]). 

It is seen that an orthonormal transformation exists: 


such that 


(3.10) 


Xi, • • • , X„ -> Yi , 


Y„ 


VnX = Y\ 

£ (X, - X) (X 1+1 - X ) = £ X.FJ 

| bo 1 t =“2 

£ (x - x) s = £ yi 

1=1 t -2 


8 A, corresponding result holds for the other one-sided case. 
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Under H the F’s are distributed with probability density 

(3.11) p(yi, •■•,?/„) = (7(50, a) exp £fc(j/j — Vw£) 2 + £ Mil/ 2 

where k, yi , • • • , y n depend on 5 0 and where the y’s are all positive. Introducing 
new variables 



(3.12) Z t = F,, (t = 2, ■ • • , n), 

and, then, generalized polar coordinates in the space of the Z’a, 

(3.13) R = a/^2 Z\ > ■ • •, 

y 1=2 

we see that Fi, R and 4>i, ■ - , 2 are completely independent. Also 

y 2 = R\ V, = ~ (Fi - 0 

while (7, being homogeneous of degree 0 in the Z’s, is a function of the 'P’s only. 
This proves that U, V 2 and V% are completely independent. The type B, test 
of H is therefore given by 

V 

£ ( x , - x)(x l+1 - x) 

(3.14) u =- - -—- - - < ci, > ca 

(1 + &o) £ fa - xf - 2«o £ (a, - £)(x l+1 - x) 

*=1 l=l 

where Ci and & are determined by 

(3.15) [ u"p(u) du = (1 — «) f u’puQ du (s = 0, 1) 

J tfl J- 03 

We still have to show that this test is equivalent to the one defined by (3.3) 
and (3.5). For 5 0 — 0 this is trivial. Let us assume 5 0 < 0. (The other 
case goes through similarly.) The inequality u < Ci is equivalent to 

(3.16) (1 + 25 0 Ci)2fa - x)(x {+1 — *)<(! + S?)2fa - x) 2 


and hence to 


(3 17) 

provided 1 + 2ciS 0 > 
(3.18) P {U < c 1 ) 


2 fa - s)fa+i - x) 

2(x, - x)^ 1 

1 

0. Suppose 1 + 2ci5 0 < 0, i.e. Ci > — ~ ■ 
> p\u < = p(0 < 2(X, - X) 2 } 


• We denote the probability of an event A by P 



Then 6 
= 1 
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i.e. P{U < Ci) =1 winch would contradict (3.15). Similarly if 1 -f- 2 c 2 S 0 < 0 
wc would have P{U > c 2 ] =0 and lienee our test would be one-sided and there¬ 
fore not unbiased. The inequalities u < ci, > c 2 are thus equivalent to the 
inequalities r < n, > r 2 and since 

_ r _ 

“ = 1 + So - 2io>■ ’ 

(3.5) also follows. 

The existence of type Bi and uniformly most powerful one-sided tests of the 
hypothesis II is rather surprising. For when a and £ are assumed known, neither 
the type Ai test nor the uniformly most powerful one-sided test of the simple 
hypothesis II': 5 = S 0 exists. This is easily seen by determining the most 
powerful and the most powerful unbiased test against a specific alternative Si 
for the hypothesis II 1 in the population 

1 — 8 n 

(3.19) p(xi, x n ) = -y exp [-J[(l + 6 s )2o;? - 2S2a:,a:, +1 ]]. 

The distribution of the criterion R was obtained by R. L. Anderson [16] (see 
also [17]) for the case 8 = 0. Madow [15] using Anderson’s result found the dis¬ 
tribution for arbitrary 8 (Approximations to the distribution have been studied 
by various authors, for the literature on this cf. [18], Recently Hsu [19] ob¬ 
tained an asymptotic expansion.) A direct derivation for arbitrary 6 may be 
based on the following theorem of Cramer, which was communicated to the 
author by Dr. P. L. Hsu. 

Theorem 3. (Cram6r) 7 . If X, Y are two random variables, (not necessarily 
independent), Y > 0, then 

(3,0) r{f 


where <p x and are the characteristic functions of X — xY and Y respectively, 
provided 


(3 21) 



<px(t) — Tp(t) 
t 


dt < oo . 


Theorem 4, If 


(3.22) 




(Zn+I = ail) 


7 Differentiated forms of the theorem were given by R. C. Geary [Jour Roy. Stat . Soc. 
Vol 107 (1944) p. 56] and H. Cramer [Exercise 6 on p. 317 of Mathematical Methods of 
Statistics. Princeton Univ. Press (1946)]. 
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and if 


(3.23) 


then 


R = 


£ (X, - X) (X >+1 - X) 
1=1 _ 

£ ex, - x 2 ) 

»-l 


P{R > r] 

(3 24) 


2 «+ i /2 1 _ s n - 

n (1 - S)(l + S 2 - 2 Sr) 



(-1 ) 3+1 (X, - r) n ~ m 


■ jir -. 2jf7r 
sin — sin — 
n n 


1 + S 2 - 2SX f 


where the summation is extended over all integer j, 1 < j < 
where 


n 

2 ’ 


for whicfc X, > r, and 


(3 25) X, = 2 cos ^ 

n 

The proof of this theorem from Theorem 3 is straightforward and only will 
be indicated here. If X and F denote the numerator and denominator of R 
respectively, the characteristic functions of Y and X — rY may be obtained by 
the method of circulants (cf. [12, 17]). The integral on the right hand side of 
(3.20) is then easily evaluated by the theory of residues when n is odd. In the 
case that n is even, the integrand has two branchpoints, one in the lower and one 
in the upper half plane. These may be separated, and then again the method 
of residues may be applied 


4. Similar regions. The problem of finding all regions similar to the sample 
space with respect to a parameter 6 was solved by Neyman and Pearson [1] for 
a certain class of probability laws. In a later paper Neyman proved ([20] 
proposition IX) that if there exists a sufficient statistic T for a parameter 6, 
then w is similar with respect to 6 if it has the following structure: For the inter¬ 
section w(t) of w with the surface T = t, the relative probability of w(t) given 
T = t has a constant value independent of t. We shall show in this section that 
for a large class of probability laws which admit a sufficient statistic for B the 
regions with the above structure are the only ones that are similar with respect 
to 9. 

We consider samples from a univariate distribution and we distinguish three 
cases as one, both or neither of the extremes of the range of the distribution 
depend on the parameter 6. For the first of these cases (cf Pitman [21]) we con¬ 
sider samples from a distribution with probability density 
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(4.1) p(x) = ^ , m < x < c, 

where k(6) is a strictly monotone continuous function of 6 and where c may be 
infinite. Introducing a new parameter 8 = k(9) the distribution of a sample 
from (4.1) is given by 

(42) pfa, •••,«,) 8 < x t < c. 

To obtain the totality of regions w similar with respect to 5 let us denote by 
Wi, • • • , W n the portions of the sample space where the smallest of the x’s is 
Xi, • ■ ■ , as„ respectively. For any region w denote by Wk the intersection of w 
with Wk ■ Consider a transformation carrying W 2 , • • • , W n into Wi , letting 
yi = min(a;i, • • • , a:,,) and letting in Wk ■ 


(4.3) y 2 = xi, yt = %,•**, y* = a*_i , Vl+i = **+i, 

Denote byW the image of wi under this transformation. 
w be similar with respect to 8, 


(4 4) 


/ 


/(Xl) • • f(x „) 
b{8) 


dxi • • • dx n 



* * f 2/n — %n • 

The condition that 


may be written in the form 


(4.5) 


Js b(8) \ti L 


Jv>k(v i) 


ffa) ■ • • f(y n) dy 2 


dy)j dy! 


= nt f { [ f{yz) - • • f(y n ) dyi • ■ • dy\d yi 
Jt b{8 ) (Jor(vil) J 


where W(yi) denotes the region yi < yi < c, (i = 2, • ■ • , n), that is, the region 
of variation of y t , • • • , y n given yi, and where w k (yi) denotes the region of vari¬ 
ation of yi, • • • , y n given %jx and w; : . From (4 5) we obtain 

(4.6) ^ J 5 fivi)*Kyi) dyi = 0 

where 

n r. 

My 1 ) = £ /(t/ 2 ) • • ■ fiVn) dyi,-” dy n 

k~ 1 •'wjb(Wi) 

( 4 - 7 ) „ 0 

- ne / f(yd) - f(yn) dyi •• • dy n . 

t'l/1 J]/! 

But (4.6) implies 


(4,8) \p(yi) = 0 almost everywhere 

and since we can only determine w up to a set of measure 0, we may omit the 
qualification in (4.8). Therefore a necessary and sufficient condition for w to 
be similar is 
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(4.9) 



n r 

E / 

fc—1 ‘'tCifc(lfl) 


/(t/s) •' • /(?/n) d'lji ■ ■ • dy n 


e 


for all i/i. 

To see more clearly the structure of these regions, let us take n = 2. Equa¬ 
tion (4.9) states that on each of the broken lines of Fig. 1 the relative probability 
of w = wi + w' 2 given Y t = yi is t, where the decomposition of this probability 
into its two components may vary with yi . 



X :Y t =y. 


In general equation (4.9) states that on each hyperplane Yi = y\ the relative 
probability of w is independent of yi . Since Yi = min (Xl , ■ ■ ■ , X n ) is a suffi¬ 
cient statistic for 6, Neyman’s theorem in this case does give all similar regions. 

Next let us consider the case where both extremes of the range of the distribu¬ 
tion depend on the parameter. We shall assume (cf. 121]) that X \, • ■ • , X n 
are distributed with probability density 

(4.10) p(s) in 6 < x < b(8) 

8 W 

where b is a strictly decreasing continuous function over an interval [—», 
b(—»)] and where b[b(— °°)] = — <». These assumptions insure that there 
exists a unique number a , — °° < a < b( — «>), such that 1(a) = a. 

Denote by Wtj , (i, j = 1, ■ • • , n; i ^ j), the portion of the sample space 
where the smallest and the largest of the ai's are and x } respectively. Denote 
by W„i and W,^ those portions of W t , where x , is greater than and less than 
b~ 1 (x J ) respectively. For any region w denote by W; 3 k the intersection of w with 
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W tl k ■ Consider a transformation carrying the sample-space into Wm , letting 
yi = min (xi, - • • , x n ), y n = max (si, •• • , x n ) and m W tJ letting y 2 , • ■ • , y n ^ 
denote the remaining x’s in the order of their subscripts. Next make a trans¬ 
formation carrying Wm into Wmt , letting z x = max [y x , b a (i/J], s n = min 
[y x , b~\y n )] and zi = yk for 7c = 2, ■ ■ • , n — 1. Denote by w,;* the image of 
w' tj k in Wmi ■ 

Then Z n is a sufficient statistic for 6 (cf. [21]) and there exist functions f x , g x 
such that the density of Z n is given by 


(411) 


0,(0) 


in 0 < z n < ci 


while the distribution of the remaining Z’s given Z n is independent of 6. 

The condition that w be a sinular region may now be written, analogously to 
(4.5), in the form 


(4.12) 


Ja g l\&) i, h lc 


,1 fe(*n) ' 


P(2 


1 > 


, Z„_i | Zn) dZl dz n - 1 dz n — « / 

J Q 


° /l(s„) 

0i(0) 


dz n 


and hence by the argument which led to (4,6), as 


(4.13) £ / p(z x , • • • , z„-i I ’*») dsf - dz n -1 = e for all z n . 

Thus in this case also Neyman’s theorem gives the most general similar region. 

For the case that neither extreme of the range of the distribution depends on 
the parameter d, it has been shown by various authors [22, 21, 23] under slightly 
varying assumptions concerning the regularity of the distribution function, that 
the existence of a sufficient statistic implies 

(4.14) p(x | 6) = exp [P(0) + T(x)Q(e ) + R(x)]. 

This (cf. [10]) is a special case of that for which Neyman and Pearson determined 
the totality of similar regions, however under the restriction that the moments 

of $ = — 23 log p(X,) uniquely determine the distribution of 4>. We shall 
30 <_i 

briefly indicate how this assumption may be avoided. 

Let Xi , • ■ ■ , X n be a sample from (4.14), or, more generally, (this is the case 
considered by Neyman and Pearson), let X x , ■ • ■ , X„ be distributed with prob¬ 
ability density 

p(xi , • ■ ■ , x„) 

(4.16) 

= exp [ P(6 ) + u(x 1 , • • • , x n )Q(e) + v(x x , ■ • • , *»)] 

in a sample space W+ which is independent of 6. Wc shall assume that the set 
of values which Q takes on contains at least some interval. Introducing S = 
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— Q(6) as a new parameter, we shall obtain all regions similar to S (where the 
set of values of 5 contains an interval) for the distribution 8 


pfc 1 . 

(4.16) 


j %n) 


= exp [p x (5) 


5 • u(xi, * * * , x n ) ~f" p(*ti j 1 * * j ^n)] 


under the assumption that 



5 ^ 0 except possibly on a set of measure 0. 


Let us for a moment assume that there exist functions f,{x ,, ■ • ■ , x n ), 
(i — 2, ■ ■ ■ , n), with continuous partial derivatives almost everywhere and such 
that the transformation 


(4.17) l/i = u(x i, ■ ■ • , x h ); y, = f t (x x , • • • , x„), (i = 2 , ■ ■ • , n), 

is one to one on W + except possibly on a subset of measure 0. Applying this 
transformation we may write the condition of similarity in the form 

e FlW ~ iyi f f(y 1 , ■ ■ • , y n ) dy 2 -- - dy n -dy x 

<x> % 'w(2/l) 

= «[ e FlW ~ Syi f f(yi, ,y n )dyt-■■ dy n -dyi 

J— oo J W (l/i) 


L 

(4.18) 


where W(yi) denotes the region of variation of , • • • ,y n given y \, and where 
w(yi) denotes the region of variation of y 2 , • • • , y n given y and w. Furthermore 
f(yi , ■ • • , y n ) is independent of S. From the theory of bilateral Laplace trans¬ 
forms it is known that (4 18) implies that 


(4.19) / f(yir •• ,y*)dy*- dy n = t J(y u •• • , y„) dy t ■ ■ ■ dy n 

which is the desired result 

More generally it may be shown that our assumption concerning u(xi, ■ ■ ■ ,x„) 
insures the existence of functions (t = 2, •••,«), such that under the trans¬ 
formation (4 17) no point (l/i, • • • , y n ) has more than a denumerable infinity of 
counter images in x-space. Our proof can be modified to cover this case. The 
argument is similar to that used to obtain equations (4 9) and (4.13) which were 
also arrived at through many to one transformations. 


5. Testing exponential and rectangular distributions. In their fundamental 
1928 paper [24] on likelihood ratio tests, Neyman and Pearson discussed various 
hypotheses relating to normal, exponential and rectangular distributions. Later 
they and other authors developed a theory of similar and bisumlar regions which 
made it possible to obtain optimum tests of many composite hypotheses with 

8 An assumption that wo can solve for 6 as a function of 5 is not needed since we can 
determine P\ (S) by mlcfriuliiig the density (4.16) over W+. 
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one constraint concerning normal populations. This theory however is not 
applicable to most hypotheses concerning exponential or rectangular distribu¬ 
tions. We shall m this section obtain optimum tests of some hypotheses relating 
to those latter distributions, using the method of the previous section. 

Let us first consider a sample Xi, ■ •■, X n from an exponential population, 
the probability density of the sample being. 

- \ Yj (x { - 5)1 if Xi > b, (i = 1, ... , n) 

and let us consider the two hypotheses Ih'.a = 0 , Ih'.b = b 0 where, without loss 
of generality, we shall take a 0 = 1, 5 0 = 0. The likelihood ratio tests of both 
these hypotheses were shown to be completely unbiased by Paulson [25]. We 
shall prove 

Theorem 5. The likelihood, ratio tests of Hi and Hi are type and uniformly 
most powerful, respectively. The one-sided tests based on the likelihood ratio criterion 
for Hi are the uniformly most powerful one-sided similar regions for testing this 
hypothesis. 

Proof. In order to simplify the argument we shall give a detailed proof only 
for the restricted class of tests which are symmetric in the variables Xi, ■ ■ ,X n . 

For testing Hi let us make the following transformation introduced by 
Sukhatme [26]: 


(5.1) p(x i, •••,*»)“ -exp 


(5.2) 


Zi = nYi 


Z<= (» - i + 1)(F, - F_i), 


(i = 2, • • • , n), 


where F; is the ith of the X’s in order of magnitude. Then 



if Z\ > nb; z, > 0 


(i = 2, • •• , n). 


We want to determine all regions w which under II are similar to the sample 
space with respect to 5, i.e. all regions w satisfying 


f e~ ( ’ 1 ni) exp — 2 dzi •• • dz n dzi 

Jut _ _ 


(5.4) 


I d%2 * ' 

■ • dzn\ dzi 


<*>) <&) 

■ e = « / e~ {,i - nb) dzi 

J nb 

t-2 J 

> > 


where w(zi) denotes the intersection of w with the hyperplane Zi — Zi. Now 
(5.4) is equivalent to 
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(5.5) 
where 

(5.6) 


C"0 (b) 

e ni / e 11 f(zi) dzi = 0 
* 

/(zi) = / exp — 22 z» \dzi ■ • • dz n — 
L t—2 J 


and this in turn is equivalent to 

(5.7) /(z : ) = 0 for all Zi. 

Of all the regions w satisfying (5.7) we want to determine the one which against 


a specific alternative, say oi, has maximum power, i.e. for which 

(5 8) f f ex pF_ - 22 2.1 dza • ■ • dz„ 

J nb Jw(z i) L »—2 J 

is as large as possible. We thus see that w will have the desired properties if 
w(zi) is determined according to the two conditions 


dzi 


(5 9) 

and 

(5.10) 


/ exp — 22 3* • • • dz n = 

•'*"(* 1 ) L •—2 J 

f exp — - 22 2 > dsj • • • dz n = 
L Oi »—2 J 


(512) 


Hence by the Neyman-Pearson fundamental lemma io(zi) is the set of points 
satisfying 

(5.11) exp [(- k £*+$■)] > C(oi, zi) 

and therefore according as 01 is greater or less than 1, u)(zj) is determined by 

n r» 

22 z< = 22 fc. - min (* 1 , ••• , a;*)] > h{a u Zi), or 

i-2 i-l 

« n 

22 2 . = 22 [®» — min (* 1 , ••• , £„)] < k'(a 1 , Zi). 

i—2 i—l 

T1 

But 2 Z x is independently distributed of Z\ and under H the distribution of 

1=2 

n 

22 does not depend on ai, m fact it is a chi-square distribution with 2n — 2 

«-2 

degrees of freedom. Thus k and k 1 , as determined by (5.9) are independent of 
ai and the two tests (5.12) are uniformly most powerful one-sided. 

Next we consider ibo more restricted class of unbiased similar regions. For w 
to be unbiased we must have 
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Ta {£/. exp [ - exp [ - iS s ‘] *>" ■ *• 

(5.13) = / (si - nb - n) exp [— (si - nb)] / exp — 52«, & s • ■ cfe» cfe, 

* n & •'wj(zi) L t—2 J 

+ [ exp [ — (zi - «())] f (E?.J exp | - 52 a.1 dz 2 ■ ■ • dz n dz x - 0 

Jnb \<—2 / L «—2 J 

The first of the integrals in the middle member equals 

/ (z — n) e~‘ / exp — 52 Zi dz 2 • ■ • cfe„ dz 
*0 Jiu(z+n&) L *“2 J 

= e / (g — ») e~‘ dz = — (n — l)e. 

Jo 


(5.14) 


Therefore 

(5.15) 

or 

(5.16) 
where 

(5.17) 


r e -(n-nM f (5) exp F — 52 8,1 d*i • ■ • dz„ dzj 

•'nb J Uf(* 1) \i— 2 / L i=>2 J 

= (n - 1)« = (n - l) e [ e ~ ( ‘ l ~ ni) dz x 

J nb 

poo (&) 

/ C - ' 1 £f(8i) dzi = 0 

Jnb 

fif(«i) = [ (22z.) exp \ — 22 «• 1 dz t ■■ • dz n - (n - l)e. 

•M*l> \i-2 / L <-2 J 


Thus finally the condition of unbiasedness reduces to 


dz 2 • dz n = (n — l)e 


(5.18) [ (52 2.) exp \ - 52 8,] 

J »(n) \*-2 / L i-a J 

and we seek the region w(zi) which satisfies (5.9), (5.10) and (5.18). 

By the fundamental lemma w(zi) is given by 

(5.19) exp ^-52z<J > j^C , l (ai, Zi)5+ <?*(«!, s x )J ■ exp |^— 52s<J 
which is equivalent to 

71 

(5.20) 52 2< < fci(oi, Zi), > 7c 2 (ai, Zi) 


where fei and fc 2 are determined by (5.9) and (5.18), and are therefore independent 
of zi and a. Thus the region (5.20) which of all unbiased similar regions 



OPTIMUM TESTS 


489 


maximizes the power against the alternative a = ai is independent of a, and 
hence is a region of type Bi , This completes the proof since it is easily verified 
that (5.10) is equivalent to the likelihood ratio test. 

The proof for regions which are not necessarily symmetric in the variables 
follows similarly if instead of the transformation (5 2) one uses a transformation 

n 

Ut = MX 1 , • • • , X n ) which is one to one and such that lh = Z x , £7 2 = X, Z ,. 

<i=2 

The distribution of Z7 3 , • ■ , U n is then independent of a and b since Ui , Ui 

are a pair of sufficient statistics for these parameters, and the proof carries over 
step by step. 

Next we consider the hypothesis H-i'.b — 0, and again we restrict ourselves to 
regions which are symmetric m the variables, although as before the proof can 
be modified to cover also nonsymmetric regions. 

We first make the transformation to 2i, ■ • , Z„_ given by (5.2). In the 
n — 1 dimensional space of Z 2 , • • ■ , Z n , we then transform to new variables 

n 

U, Ti, • • • , T „_2 where U = X. Z, and where the T’s are the generalized polar 

1=2 

angles Obviously the distribution of the T’s does not depend on a , since they 
are homogeneous of degree 0 in the Z’s Furthermore the 'k’s are independently 
distributed of U since the probability density of the Z’s is constant over the 
hyperplanes TJ =* u. Thus 


viz 1, W, ihl • . ’Pn-i) = 


(5 21) 


exp 


Zl — rib 1 —u/a 


a J 


u n -*e- ula p(t i, ••• 


We next introduce new variables 


V = Zl + U and T = 


Zi 


Zi u 


(5 22) 
and find 

p(v, l, \h, ■ * , Pn-i) = fn e xp [ - y " -1 0- ~ tr 2 , *—») 

(5.23) 

for v > nb, — < t < 1. 

— v 

For w under Hi to be similar with respect to a, we must have 

f exp [" — -1 r n_1 f (1 — t) n 2 p(pi! ■ • ■ 1 Pn-i) dt dipi • ■ • dip n -2 • dv 
Jq & I C&J 

(5.24) r 

= e l ^ 6XP [" o] dV 
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where w(v) designates the intersection of w with the hyperplane V = v, and 
where w 0 (v) denotes the part of w(v) lying between the hyperplanes t = 0 
and t = 1 . 

Hence the condition of similarity may be written as 


(5.25) 

where 



v n ~ l f(v) dv 


0 


for all a > 0 


(6 26) f(v) = [ (1 — 0" • • • ^ B -a) dt #i • • • #„_2 — e. 

vujq(t>) 


By the uniqueness theorem for Laplace transforms, (5.25) implies f(v) = 0 
for all v > 0, so that the condition of similarity finally reduces to 

(6 27) f (1 — t) n ~ 2 p(\p i, ■ • • , i/'n-.j) dt d\pi ■ ■ ■ d\p n - 2 = e. 


Of all similar regions, let us find the one which has maximum power. Obvi¬ 
ously we want to include in w(v) all points for which t < 0. In addition we want 
to choose wa(v ) such that 


(5.28) 


f (1 - t) n i, • ■ • , &,_ 2 ) dl dipi ■ ■■ # n -s = 


max 


where w b (v) is that part of ui(u) in which max ^0 , — ^ < t. 

If, for some alternative 6, w 0 (v ) is contained in ^ < t < 1, then w b {y) and 
Wo(v) coincide and hence (5.28) attains its maximum value e whatever the posi¬ 
tion of w 0 (v ) in — < t < 1. If on the other hand — is so close to 1 that 
v v 

Tib 

— ^ f < 1 is too small to contain Wq (a), then (5.28) attains its maximum for 

Tib 

any w 0 (v) containing — < t < 1. There exists therefore a unique w 0 (v) which 
maximizes (5.28) for all values of b and v, namely the region defined by 
(5.29) C(») < t < 1 

where C is determined by (5.27). 

Since under II 2 , the statistics V and T are independent, C does not depend 
on v. The test 


(5.30) t < 0, >C 

which we have just shown to be uniformly most powerful, is also the likelihood 
ratio test which completes the proof of the theorem. 

We shall finally consider an example of an optimum test in connection with a 
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rectangular distribution Let Xi, ■ ■ ■ , X n be independently and unif ormly 
distributed over ( a, a + 9), where 9 is positive. For testing the hypothesis 
H: a = cio, the test 

(531) l 1 ~ g < 0 , > C 

■where Yi and Y n are the smallest and the largest of the X’& respectively, is the uni¬ 
formly most powerful of all similar regions. 

The proof of this goes through very much like that for H 2 in Theorem 5. 
Without loss of generality we take a 0 = 0. Also again, to simplify the proof, 
we restrict ourselves to regions which are symmetric m the variables. We need 
the following lemma. 

Lemma. Let Xi, • • ■ , X n be independently and uniformly distributed over 
(a, a + 0). Let Y, denote the tth X in order of magnitude, and let 

(5 32) T n = Y n , T k = ~ , ( = 1, • • • , n - 1). 

Then for a > 0 

(5 33) ‘ P(h, ■■■ = n ^tl~ 1 CZ\ ■■■ t t 

when 

a < t n < a + 6, —-— 7 — < 4 < 1, (/c = 1, • ■ • , n - 1). 

£ n ‘in—1 * * * lk+1 

This is easily seen by applying the usual method of Jacobians. The inequali¬ 
ties describing the sample space of the T’b are equivalent to the following more 
convenient ones: 

(5.34) a < t n < a + 9, j <Uk ■■■ < 1,4 < 1, (fc = 1, ■■■,«- 1). 

£/i 

Let us denote by iu(f„) the intersection of a region w with the hyperplane 
T„ = t n , and by w 0 (t„) that part of w(t n ) contamed in the cylinder 0 < 4 < 1, 
(k — 1 , • , n — 1 ); then we find as a necessary and sufficient condition for 
w to be similar with respect to 9 (assuming H) 

(5.35) (n - 1)1 [ CZltnZl ••• t» din-i dh = t. 

Ju»o C«f»> 

Of all regions satisfying (5.35) we want to find the most powerful one. Let 
us first consider alternatives a > 0. If w a (t n ) denotes the common part of w a {t n ) 
and the region 

a 

(5.36) f < L-dn-i • ■ • 4 < 1, 
we must choose w a {t n ) such that 
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(5 37) / ilz\ • il -2 * • • U di n -\ ■ ■ ■ dt ! = max. 

‘'u'oUn) 

From this it follows easily that against alternatives a > 0 the uniformly best 
choice for Wa{t n ) is 

(5.38) kk--- U-i = - > C'(tn), 

l/n 

y i 

and since under H, rr is independently distributed of T n , C'(t„) does not depend 

* n 

on t n . 

Consider next alternatives a < 0, We include in the region of rejection all 
points for which Fi < 0. To determine w 0 (C) we notice that, given F t > 0, 
the X’s are uniformly distributed between 0 and a + 6. (Provided a + 0 > 0; 
the case a + 6 < 0 is trivial). Hence the probability distribution of the T’s 
given Yi > 0 is 

p(h, • ‘ > U\ Yi > 0) = 

0 < t„ < a + 9, 0 < l k < 1 for 7r = 1, • • • , n — 1. 

p(k , • • • , tn-1 | l n , a < 0, Fi > 0) 

j * •' 3 —i | tn j a “ 0) 

is independent of k • • • , £„_i and hence the power of w against alternatives 
a < 0 is independent of the choice of w 0 (t n ). Therefore the region 

(5 41) J/i < 0, Hi > C 

Vn 

is umformly most powerful against all alternatives. But (5.41) is equivalent to 
(5.42) _J(?_ < 0, > C. 

Vn ~ Vi 

It is interesting to compare this result with that for the corresponding simple 
hypothesis. Let H' be the hypothesis: a = 0 when the X’s are assumed inde¬ 
pendently and uniformly distributed over (a, a + 1). There exists no umformly 
most powerful test of H'\ instead the two uniformly most powerful one-sided 
tests exist. By analogy with the normal case one might then expect for H' 
that of all tests with symmetric power-functions, there be a uniformly most 
powerful one. This however is not so: there exist infinitely many admissible 
tests with symmetric powerfunction. 


(5.39) 
when 

Thus 

(5.40) 
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In this and the previous section we restricted ourselves to problems involving 
only one nuisance parameter. However, the method applies also to problems 
involving several nuisance parameters. 

In the usual way (cf [20, 9]) the results of this section may be translated to 
give optimum sets of confidence intervals for estimating the parameters m ques¬ 
tion In this connection it is an open question whether the confidence regions 
based on the type Bi tests discussed in section 2 will always he intervals; one 
would expect this to be the case 

The author wishes to acknowledge his indebtedness to Piofessor P L. Hsu 
for many helpful suggestions. 

Added in proof - In a joint paper by Professor Henry Scheffd and the present 
author which has been submitted to the Proceedings of the National Academy of 
Sciences, a result is given concerning the existence of certain 1:1 transformations. 
This result bears on Section 4 of the present paper where a question arises con¬ 
cerning the existence of a 1:1 transformation The existence of such a trans¬ 
formation is now assured and, as a consequence, the last paragraph of Section 4 
has become superfluous 
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A CORNER TEST FOR ASSOCIATION 

By Paul S. Olmstead and John W. Tukey 
Bell Telephone Laboratories and Princeton University 

1. Summary. This paper proposes a new test (the “quadrant sum’’) for 
the association of two continuous variables. Its notable properties are: 

( 1 ) Special weight is given to extreme values of the variables. 

(2) Computation is very easy. 

(3) The test is non-parametnc. 

Significance levels (for the quadrant sum) are given to the accuracy needed for 
practical use. To this accuracy they are independent of sample size (see Fig. 1). 
The generating function of the quadrant sum is given for the null hypothesis 
(no association = independence). A limiting distribution is deduced and com¬ 
pared with the cases 2?i = 4, 6 , 8 , 10, and 14 Extension to higher dimensions, 
and application to serial correlation are discussed 

2 . Description of test (even number in sample). We shall desciibe the' 
test as though a scatter diagram had already been drawn. The possibilities of 
direct computation from tabular data are indicated by the examples in sections 
8 and 9. 

In the scatter diagram, draw the two lines, x = x m , y = y m , where x m is the 
median of the a;-values without regard to the values of y, and y m is the median 
of the y-values without regard to the values of x. Think of the four quadrants 
or corners thus formed as being labelled +, —, +, —, in order, so that the upper 
right and lower left quadrants are positive Beginning at the right hand side 
of the diagram, count in (in order of abscissae) along the observations until 
forced to cross the horizontal median. Write down the number of observations 
met before this crossing, attaching the sign + if they lay in the + quadrant, 
and the sign — if they lay in the — quadrant. Repeat this process moving 
up from below, moving to the right from the left, and moving down from above. 
The quadrant sum is the algebraic sum of the four terms thus written down 
This process is illustrated in Fig. 2 , where the black dots represent contributions 
to the sum, and the dotted lines, crossings. 

When there are an even number of pairs ( x , y ) and no ties, the medians will 
pass between the points. In this, the simplest case, the distribution of the 
quadrant sum is known for the hypothesis of no association (that is, of inde¬ 
pendence), and significance levels are given in Table 1 for the magnitude (abso¬ 
lute value) of the sum. It will be noticed that the sample size does not enter in 
any important way. 

The cases of an odd number of observations and of ties are discussed m the 
next two sections Simple devices make the test usable in most cases. A very 
great tendency toward ties, however, will make it inapplicable. This will be 
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ummpoi taut in most applications because of the fact that attention is being 

J ’_J_1 i. jl. ... ■ i ° 


QUADRANT SUM 
IS) = 16 1/2 
P ^ 0.5% 



Fig. 2. Scatter diagram of 116 pairs of observations 


The set of data which prompted the development of the test is shown in Fig. 2. 
The accompanying report described it as follows: “The various points appear 
to be scattered almost completely at random and give little indication of corre¬ 
lation.” The quadrant sum is 163^ which is significant at the 0.5% point. 
Intuitively, the significant association of the peripheral points is clear 


directed to tne periphery. 


INDIVIDUAL TERMS 
TOP = +3 
RIGHT =+l 
BOTTOM =+6 
LEFT =+ 6 1/2 
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3. Description of test (odd number in sample). If the sample size is odd, 
then we may usually follow the process outlined above. We will have difficulty 
only when the counting process meets a point, one of whose coordinates is a 
median In this case we employ a simple device, namely: 

Given a sample of 2n + 1 pairs, let af* and y* be the medians of the ^-values 
and of the y-valucs, respectively. Let the pairs in wlpch they occur be (. x *, y k ) 
and (t,„ , y*), respectively. Replace these two pairs by the single pair (x m , y k ). 
There are now 2 r„ pairs and the regular method can be applied. 

The quadrant sum so obtained from an unassociated population has the same 
distribution as that formed directly from 2n pairs. 

4. Description of test (treatment of ties). The behavior of the test is known 
when. ( 1 ) there is no association, ( 2 ) the probability of a tie in x-values or {/-values 


TABLE 1 

Working, significance levels for magnitudes of quadrant sums 


Significance level (Conservative) 

Magnitude of quadrant sum’ 1 ' 

10% 

9 

5% 

11 

2% 

13' 

1% 

14-15 

0.5% 

15-17 

0.2% 

17-19 

0.1% 

18-21 


* The smaller magnitude applies for large sample size, the larger magnitude 
for small sample size. Magnitudes equal to or greater than twice the sample 
size less six should not be used. 


is zero. The following approximation, which has an unknown effect on the 
distribution, is suggested when ties are present: 

When a tied group is reached, count the number in the tied group favorable 
to continuing and the number unfavorable. Treat the tied group as if the 
number of its points preceding the crossing of the median were 

number favorable 

, 1 + number unfavorable' 

It soems likely that this approximation is conservative. 

6. Discussion. When a moderate number, say 25 to 200, of paired observa¬ 
tions on two quantities are plotted as a scatter diagram, visual examination 
frequently detects what seems to be definite evidence of association between 
the variables. Often in such cases, the usual methods for measuring associa- 
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tion do not find statistical significance of association. Visual judgment, par¬ 
ticularly by engineers or scientists who may wish to take action on the basis 
of their findings, gives greater weight to observations near the periphery of the 
scatter diagram. This is not always desirable—but often it is very desirable. 

A quantitative test of association with such concentration on the periphery has 
been lacking The quadrant sum test was developed to fill the gap. Its fea¬ 
tures of speed and non-parametricity are useful but secondary from this point 
of view. 

When uniform attention to the whole scatter diagram is desired, the quadrant 
sum test is of unknown usefulness We know little pnough of the operating 
characteristics of the more conventional tests, such as: 

1. The product moment correlation coefficient 

2. The four-fold table formed by the medians 

3. The biserial correlation coefficient 

4. The rank correlation coefficient 

and less about the operating characteristics of the present test. In this case, 
the quadrant sum test can only be recommended definitely for exploratory 
investigations of large amounts of data. 

There are many situations, however, where we do not know where to concen¬ 
trate our attention, and where speed and non-parametricity are cardinal virtues 
in a test. One example is the use of serial correlation in studying industrial 
processes. We may guess that here we are interested in the periphery, but 
neither theory nor experience can, so far, prove this. In such situations the 
quadrant sum is by far the fastest to use of any of the tests known to the authors, 
and we believe one of the most useful. 

6. Elementary derivations. We can easily find the distribution of 

1 . An individual term of the quadrant sum 
a For fixed sample size 

b. In the limit 

2. The quadrant sum itself 
a. For fixed sample size 

b In the limit, assuming asymptotic independence of the four terms. 

This we shall do now, leaving the proof that 2a actually converges to 2 b to a 
later section. 

Consider a sample of 2 n pairs (xi, y-,), ■ ■ • , (t 2 „ , y 2n ) from a population in 
which x and y are independent. It is both clear and easily verifiable that 

1. The set of 2 n 3 -values, x x , • • ■ , x 2 „ 

2. The set of 2 n ^/-values, j/i, ■ • ■ , 3 /s« 

3. The permutation of the order of the y- values when the pairs are ordered 
by the 3-values 

which together determine the sample, are independently distributed, and that 
nnv prrmuhTtmr i- a - likely as every other (We have assumed no tics, vliich 
i- (oni;<»». v n h probability one, of ihc continuous cumulative distnbu- 



500 


PAUL S. OLMSTEAD AND JOHN W. TUKEY 


tions of x and y). Since the quadrant sum depends only on the permutation, 
its distribution in the absence of association does not depend on the distribu¬ 
tions of x and y 

We must solve, then, certain purely combinatorial problems—under the 
hypothesis that the 2 nl permutations of the y-values are all equally likely. 
It may simplify matters to assume that the values of x in the sample are 1, 2 , • ■ • , 

2 n and that those of y are the same. How, then, do we calculate the distribu¬ 
tion of a single term of the quadrant sum. Let us begin with small :c-values, 
and the pair (1, yi). If j/i = 1, 2, ■ ■ • , n, we count “one” positive, and if 
2 /i = n + 1 , n + 2 , • • • j 2 a, we count “one” negative. We pass on to ( 2 , y 2 ) 
and so on. How many permutations yield a count of exactly k positive values? 
Those in which yi ,y t , • • • ,y k are equal to or less than n, yh+i equal to or greater 
than n + 1, and the other (2 n — k — l)y’s are arbitrary. There are: 

n(n — 1) • • ■ (n — k + 1) • (n) ( 2 n — & — 1)1 

such permutations, the fraction of all ( 2 ra)I permutations being: 

. . n(n — 1) • ■ ■ (n — 7c + 1 )n 

[ } (2+)(2?T- 1) ■.. (2n - 7r + l)(2n - lc) 

which is, then, the probability that this contribution will equal +&, or by sym¬ 
metry, the probability that it will equal ~lc,k + 0 . 

For large n, this becomes merely: 

(2) p k = 2 -<l * l+1) , k * 0. 

In order to obtain the distribution of the quadrant sum itself, we must concern 
ourselves with the lack of independence of the four terms. This is indicated 
most clearly in the case of 2 n = 2 , where the 21 = 2 permutations yield 
+1 +1 +1+1=4 and —1 —1 —1 —1 = —4. Here, there is complete lack 
of independence. We shall see later that there is effectively independence in 
the limit, so that it is worth while to calculate the sum of four independent 
terms with the limiting distribution ( 2 ) and find that it satisfies: 

(3) Pr(\ independent sum of 4 terms | >7b) =-^— - ,k > 0. 

The details will be omitted. 

A simple device, reminiscent of Wald’s [3, 1943] establishment of the two- 
dimensional tolerance limits enables us to avoid difficulties with lack of inde¬ 
pendence and compute the exact distribution of the quadrant sum for any n. 
We decompose the permutation of the y-values into the following parts, which 
together specify the permutation: 

‘ (a) The number, j, of pairs in the upper right quadrant. 

(b) The set of j values of x between n + 1 and 2 n corresponding to pairs in 
the upper right quadrant. 
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(c) The set of j values of y between n T 1 and 2 n corresponding to points 
in the upper right quadrant. 

(d) The set of j values of x between 1 and n corresponding to pairs in the 
lower left quadrant (Note that the use of medians ensures that the 
lower left and upper right quadrants contain the same number of points.) 

(e) The set of j values of y between 1 and n corresponding to pairs in the 
lower left quadrant. 

(f) The permutation of j objects defined by the pairs in the upper right 
quadrant. 

(g) The permutation of n — j objects defined by the pairs in the upper left 
quadrant. 

(h) and (i) the permutations from the remaining quadrants. 

It is easily verified that: (1) given j, items (b) to (i) can be assigned at will, (2) 
each assignment of (a) to (i) corresponds to one and only one permutation, (3) the 
quadrant sum depends only on items (b) to (c). In fact, the right hand term 
depends on item (b), the upper term on item (c), the left hand term on item (d) 
and the lower term on item (e). While j remains fixed, the terms behave 
independently. 

For fixed j, what is the distribution of a single term? If a set of j x-valnes 
gives the term +fe, it must contain the k largest x -values and not contain the 
next. There are: 


/n — k — l\ 

\» - 3 ~ V 

such sets. The generating function for a single term, is, then: 


(4) 



Since the terms are independent for fixed j, and there are (j !) s ((n - j) 0 
ways to supply the permutations forming items (f) to (i), the generating func¬ 
tion for the quadrant sum, S n , is: 


a M _ f* Q') 2 ((n -j)')* 
(5) 0„(x) - 1, (- 2n) , 


r > , 

(n — k 

- A V* 

(n - k — l\ 

It. 

)** + £ 

■ i r 

ij 

_i 

\n - j 

-1/ 

\ 2 — 1 / J 


The exact probability of equalling or exceeding each value of S n has been 
computed for 2w = 2, 4, 6, 8, 10, and 14. Table 2 gives these probabilities 
and Fig. 3 shows the values of 


- ■+ logic Pr{ | quadrant sum | > m) 

5 

this particular function being chosen for its relative constancy. The maximum 
value of the quadrant sum is 4 n, and for values of k less than 4 n - 6, there 



TABLE 2 

Probability of a Sum of Absolute Value Equal to or Greater than k when a Sample 
of 2n is Draum from an Unassociated Population 



Variance 

of k 16 24 


* Probability for 2n = », k > 0, is given by 

9fc 3 + 9fc 5 + 168fc + 208 


1.0000 1.000000 
0.9115 0.912037 
0.7580 0 754630 
0.6039 0.599537 
0.4690 0.462963 
0.3547 0.346933 
0.2611 0.252025 
0.1876 0.177662 
0 1322 0.121817 
0.0918 0.081471 
0.0632 0.053295 


0.0432 

0.034189 

0.0296 

0.021557 

0.0202 

0.013386 

0.0139 

0.008200 

0.0096 

0 004963 

0.0066 

0.002972 

0.0045 

0.001762 

0.0031 

0.001036 

0.0021 

0.000604 

0.0014 

0.000350 



24 
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is quite good agreement between the curves for finite n and formula (3) at 
the practically significant percentage points. The situation for very small 



Fig. 3. Comparative relationships for finite and infinite sample sizes and 
normal approximation to the infinite sample size 

probabilities suggests a careful consideration of the limiting behavior of the 
quadrant sum distribution (see section 10). 
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The device for samples of 2n + 1 deserves a word of justification If there 
is no association, the 2 n + 1 ?/-values are randomly paired with the 2n + 1 
z-values, and, in particular, the y-value paired -with the .r-meclian is randomly 
selected. If we pair it with the (randomly selected) a-value which was paired 
with the y-median we still have random pairing. The pairing of the 2 n pairs 
is random, although neither the ai-values nor the y-values make up a sample. 
The randomness of pairing is all that has been used in the discussions of this 
section. 

7. Extension to higher dimensions. The same ideas that underlie the quad¬ 
rant sum test for two variables may be extended in several ivays to give tests 
for various types of association among three or more variables Only one 
three-variable case will be discussed here, leaving further extension to the 
reader. 

Given three variables, x, y, and z, and a sample of matched observations on 
these, it is clearly possible to use the simple quadrant sum test for two variables 
to investigate association between x and y separately, between y and z separately, 
and between z and x separately. If the Pearson coefficient of correlation were 
being computed and were found to be close to zero for each of these pairs, it 
would be assumed that there was no detectable association through the second 
moments. In a trivariate normal or Gaussian distribution, where the first and 
second moments determine the whole distribution, if thero is independence be¬ 
tween the separate pairs of variables, there is no possibility of a three-way 
association. It is of some interest, however, to notice that a corner sum test 
can be devised that will measure the effect of such triple association in case it 
does exist. 

Consider the octants into winch the three median planes for x, y, and z, 
respectively, divide the three dimensional scatter diagram and label the octants 
alternately plus and minus, in the manner suggested by Fig. 4. More precisely, 
an octant is counted as plus if an odd number, that is three or one, of the vari¬ 
ables are greater than the medians of the sample, and the remaining octants are 
labelled minus. It is clear that we may repeat the process of coming in along 
each axis passing from observation to observation as long as they remain in a 
region of fixed sign, and writing down as a contribution to the final or octant 
sum the number of such consecutive elements and the sign of the region in which 
they were found. There will be six terms rather than four, as was the case 
for the test based on quadrants, and so a new set of significance levels will be 
required. Table 3, following, lists the situation for a very large sample. 

The situation has been sketched for the case of 2 n triples If there are 2n + 1 
triples, then we may have trouble with the medians again However, a similar 
device works, except that we must agree on a last variable in order to form the 
synthetic triples uniquely. For example, consider the triples ( m , 3, 5), (9, m, 1), 
(12, 4, m), where m denotes the median. Taking the order in which the vari¬ 
ables are written, we get (12, 3, 5) and (9, 4, 1) as the synthetic triples. Other 



Fig 4 Octant schematic—solid sections taken as positive 

TABLE 3 


Working significance levels for the magnitudes of the octant sum 


Significance hovel 

Magnitude of Octant Sum* 

10% 

,11 

5% 

13 

2% 

15 

1% 

16 

0.5% 

18 

0.2% 

20 

0 1% 

21 


* Computed for large samples only and based on normal approximation, see 
section 11 for discussion of this and higher dimensional cases. 
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orders would yield (9, 3, 5) and (12, 4, 1 ) or (9,3, 1 ) and ( 12 ,4, 5). This slight 
dissymmetry is not pleasing but should give no difficulty. 

8 . Nongraphical example. The following example of 78 successive observa¬ 
tions of four variables shows how this test may be applied without plotting and 
how simple the computation still remains. The data concern a metallurgical 

TABLE 4 


Excerpt from Tippett’s Table 


Time T* 

Fuel F* 

Material M* 

Articles A* 

Duration D* 

1 - 

240 + 

1457 - 

1895 4- 

168.5 -f- 

2 - 

196 - 

2078 -f 

2121 4- 

152 4- 

3 - 

192 - 

1278 - 

1437 - 

153 4- 

4 - 

202 + 

1398 - 

1497 - 

145 - 

5 - 

206 + 

1944 + 

1592 4- 

153 4- 

6 - 

218 + 

1464 - 

1506 - 

147.5 - 

7 - 

155 - 

1541 + 

1762 -f 

152 4- 

8 - 

201 + 

1502 + 

1818 4- 

144.5 - 

9 - 

211 + 

1950 4- 

1144 - 

151.5 -f 

10 - 

236 + 

1768 4- 

1654 4- 

151.5 4- 

etc. to 

78 + 

185 - 

1536 4- 

1442 - 

152 4- 

Median 

Median 

Median 

Median 

Median 

39.5 

199 

1474 

1588 

149.5 


* Location of observation relative to column median; + = above; — = below. 

Tippett’s correlations (based on lightly rounded data) 

Tm = 4- 0.243 

TrA — 4" 0.266 
tua = + 0.681 
rru.A = + 0.088 
tpma. = 4" 0.141. 

problem in mass production and are taken from L. H. C. Tippett, Table XXII, 
page 63 [2]. An excerpt from the data is given in Table 4 together with Tip¬ 
pett’s calculated correlations. This table also shows the preliminary marking 
of each individual measurement as above (+) for its variable, below (—), or 
on the median (0). From this table we see, for example, that increasing T con¬ 
tributes a term —3 to the quadrant sum for T and D. It is often desirable to 
prepare auxiliary tables to assist in computing the components of the quadrant 
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and hyperquadrant sums. Such a table is Table 5 for low values of Fuel (F— ) 
arranged in consecutive ascending numerical order. The entries on this table 
for the five columns headed F, T, M, A, and D are directly comparable to the 
entries in Table 4. For example, F = 155 is — with respect to the fuel median 
and T = 7, —; M = 1541, +, A = 1762, + ,i) = 152, +. The double, triple, 
quadruple and quintuple headed columns contain simply the algebraic multi¬ 
plication of the signs in the appropriate T, M, A, or D columns. Thus, TM 
for F = 155 is —, MAD is +, and TMAD is — The contribution to each 
quadrant or hyperquadrant sum is simply the count of the consecutive like 
signs fiom the top of a column. For column AD, we have 7 consecutive + 
signs and since the contribution is to FAD and F is —, the contribution in this 
case to the octant sum is —7. The results from the ten tables of which Table 5 

TABLE 5 


Sample Table for One Component of Quadrant and Hyperquadrant Sums. Low 
Values of Fuel (F—) 


FuelR 

T 

M 

A 

D 

TM 

TA 

TD 

MA 

MD 

AD 

TMA 

TUD 

TAD 

MAD 

TMAD 

98 - 

+ 

_ 

_ 

_ 

_ 


— 

+ 

+ 

+ 



+ 

— 

— 

135 - 

+ 

— 

— 

— 

- 

- 

- 

+ 

+ 

+ 



4- 

- 

— 

140 - 

- 

- 

- 

~ 


+ 

+ 

+ 

+ 

+ 



— 

— 

+ 

146 - 

_ 

__ 

_ 

— 

+ 

+ 

+ 

+ 

+ 

+ 

- 

- 

- 

- 

+ 

147 - 

+ 

H" 

- 

- 

+ 

- 

- 

- 

— 

+ 

- 

— 

+ 

+ 

+ 

149 - 

- 

+ 

- 

— 

— 

+ 

+ 


— 

+ 

+ 

+ 


+ 


151 - 

+ 

_ 

_ 

— 

— 

_ 

— 

+ 

+ 

+ 

+ 

+ 

+ 

- 

- 

153 - 

+ 

— 

+ 

- 

- 

+ 

- 

- 

+ 

— 

- 

+ 

— 

+ 

+ 

165 - 


+ 

+ 

+ 


— 

— 

+ 

+ 

+ 



' 

+ 



Contributions to Sums 

FT FM FA FD FTM FT A FTD FMA FMD FAD FTMA FTMD FT AD FMAD FTMAD 

_2 +4 +7 +8 +2 +2 +2 -4 -4 -7 -2 -2 -2 +4 +2 


is a sample are then carried to the summary computation shown in Table 6. 
The contribution from Table 5 is shown on line F- The totals are computed 
and their probabilities of occurrence determined. 

9. Serial example. The following example, a sample of 144 observations of 
the thickness of inlay for relay springs cut consecutively from a single sheet of 
material, allows us to compare the resolution of the present test with that of 
the serial product-moment correlation The data are from Shewhart [1, 1941, 
Table 1] and the serial correlations from lag 1 h> lag 22 are from recent calcu¬ 
lations by Miss Dorothy T. Angell. The procedure for calculating the serial 
quadrant sums is similar to that for obtaining the sums for section 8. A table 
is prepared to show the observed consecutive order of the numerical values and 
each is identified as above (+), below (-), or on the median (0). This gives a 
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tab e similar to one of the elements, say Fuel, in Table 4 Four computation 
tables similar to Table 5 are required, one for the equivalent of moving from 
the right, one from below, one from the left, and one from the top of a lag cor¬ 
relation scatter diagram One table from each direction will take care of all 
ags n the first, the marginal entries are the observed values listed in descend¬ 
ing numerical order, Opposite these are recorded from the previous table the 
signs associated with observations for each lag with respect to each entry. 
The second table would record the signs relating to the lags from the observed 
values arranged in ascending order The third table would record the signs 
relating to leads from the observed values arranged in ascending order and the 
fourth, the signs relating to leads from the observed values arranged in descend¬ 
ing order The sign of the contribution from each group is the algebraic product 
of the sign of the run and the sign of the marginal entries. The length of run 
is determined in the same way as m Table 5. Table 7 illustrates the procedure 



• Contribution to Serial Quadrant Sum . 


of determining the contribution from lags associated with the observations 
arranged in ascending order. 

Two serial quadrant sums may be computed—a circular serial quadrant sum 
or a noncircular serial quadrant sum. Circular items arise from considering 
that the beginning of the set of observations is a continuation of the end in the 
same way that this assumption is made m computing circular serial correlation 
coefficients. In Table 7, circular items are shown in parentheses and are omitted 
in calculating noncircular sums. In the particular table shown, the count of 
the run lengths was identical for both types of sum, but in other cases this may 
not be the case. Since the serial quadrant sum is relatively insensitive to 
sample size, the noncircular serial quadrant sum has for all practical purposes 
the same distribution as the circular quadrant sum. The correspondence in 
this case between the serial correlation coefficient for each lag up to 22 and 
the respective values of the two types of serial quadrant sums is shown in Fig. 5. 
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LAG 

Fiq 5. Comparative pexforman.ee on a serial (autocorrelative) example 

10. Convergence to the limiting distribution. We shall consider several 
chance sums. One of these is S, which has the limiting distribution discussed 
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in section 6 . Another is S k , which is the sum of four independent terms, each 
distributed according to the limiting distribution curtailed at Its generat¬ 
ing function is 


G k (x) = (± 2-<* +1 V + ±^ X A\ 

The total probability assigned to >S'[ = -7c, - (k - 1 ), ■ ■ • , k, is less than unity, 
so that there is nonzero probability that Si is not defined. The third is S n , 
the quadrant sum itself, whose generating function is (5), and the fourth is the 
result of the same sort of curtailment applied to S n . It will be denoted by 
S n ,k and its generating function is 


Gn.kfc) = 22 


~J )\) 2 (y ( n ~ i ~ ^ 
( 2 n)l \ti \n - j - lj 


+ £ 


n — i — P 
3~l ) 


x- 


This again corresponds to a total probability less than unity. 
It is clear that 


Pr(&,* = m) < Pr(& = m) 

and 

Pr(Sl = m) < Pr(S = m). 

We shall soon show that 

( 6 ) lim Pr(£„,* = m) = Pr(/S* = m) 


and this will imply that 

lim Pr(iS„ = m) = Pr(& = m) 

n-*oo 

which is the desired result. The implication runs as follows: given e, we can 
choose k so large that 

Pr(Sjb defined] > 1 — «/3 


whence 

| Pr(£* = m) - Pr (S = m) | < «/3 

and then choose n so large that * 

| Pr(£„,* = m) - Pr(& = m) | < e/(24k + 6) 

for m = —4k, —4k + 1, • • • , 4k 


Pr (£'„,* defined) > 1 — e/3 — 


8 k + 1 
2 4k + 6 


e < 


1 


161c + 3 
247c + 6 * 


whence 
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and hence 


Pr (S n , K = m) - Pr (S. = m) | < « 


this inequality holding automatically for | m | > 4k. Hence, 

| Pr(tf* = m) — Pr(>S = m) | 

< | Pr(iS n = m) - Pr(&.; c = m) | + | Pr(S n . L = m) - Pr (S' h = m ) | 

+ UMS; - ») - Pr® -«) | < ^±|« + gjVs- + 1. < . 

This method is cloarly of general application in such problems. 

We turn now to the proof of (6). The expression for G n ,k(x) shows that’we 
may consider it the result of the following process 1 -the integer j is a chance 
quantity with the distribution 

For fixed j, G n .t is the average over j of 

f /n — i — l\ In — i — l\ l 4 


GnM = £ 

1 “ 1 




The first of these relations shows that j/n converges stochastically to \ as n 
approaches infinity. The second shows, since 

/n — i — l\ 

V — j — 1/ _ (n — i — l)!(n - j)'j' (n - j)(j)(J - 1) ■ ■ ■ (j - % + 1) 


/n\ (n - j - 1 )!(j - i)\n\ n{n - l)(n - 2) ■ ■ . (n - i) 


_ (a - i — l)!(n - ])'j ' 
(n — J — i)l(j — l)!nl 


n — i — 1 
. 3 ~ 1 


= ~ j)( n ~ j ~ 1) • • 1 (n - j - i + l)j 

n(n — 1) .. ■ (n — t) 

and both of these converge stochastically to 2“ (<+1) as n. approaches infinity, 
that (?„,*,/(*) converges stochastically to Gk(x). Since these curtailed generat¬ 
ing functions involve only powers of x in the finite range between —4k and +4 k, 
the limiting relation (6) follows at once 

11. Effectiveness of normal approximation. Fig. 3 shows the relation be¬ 
tween the asymptotic distribution of the quadrant sum for large n and a normal 
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distribution with variance 24, i.e , the same variance as that of the asymptotic 
distribution. The normal approximation is calculated from 

Pr([ S n | >m)pd Pr (x > 

where x is normally distributed with zero mean and unit variance. The asymp¬ 
totic and normal curves agree surprisingly well out to the 5% point, and an error 
of a full unit in the significance level first occurs beyond the 0.5% point. 

Since the asymptotic distributions for the quadrant, octant, hexadecant, do- 
triacontant,—, sums become more and more normal, the normal approximation 
will be even better for higher dimensions In r dimensions, this approximation 
consists in treating 

1 fib, | j- I 

Vl2r 

as the absolute value of a standard deviate. This should be quite adequate for 
large samples and r > 4. 


12. Unsolved problems. The central unsolved problem in connection with 
the quadrant sum is: 

(1) What is the operating characteristic? 

This has as a corollary the more general question: 

(2) How can the operating characteristic of a nonparametrie test be de¬ 
scribed so as to be useful to the users of the test? 

There are, of course, minor problems which are much more easily soluble. A 
few, listed in order of practical importance, are: 

(3) What is the effect on the significance levels of the use of lagged values 

of x as values of y? 

(4) What are the exact distributions for moderate n in three or more dimen- 
sions? 

(5) Do the analogous limiting distributions hold for three or more dimen- 

sions? 4 , . 

(6) What is a better approximation to the limiting distribution for moderate 

n? 

To encourage others to solve some of these, we close with the assurance that 
they have our good wishes. 
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DISCRIMINANT FUNCTIONS 

By Geoese W. Brown 
Iowa State College 

1. Introduction: In the following sections the development of discriminant 
function techniques is approached from an elementary point of view, considering 
first an essentially trivial problem, then working up to the more complex situa¬ 
tions which may be handled by discriminant function methods. No attempt 
has been made to follow the pattern of the historical development in this process, 
and no consistent attempt has been made to allocate proper credit, in the text, 
to those individuals responsible for the introduction and exploitation of these 
methods. A more or less exhaustive bibliography of discriminant function 
applications and related theory is given at the end of this paper. 

Some historical perspective may be gained, however, from a very sketchy 
consideration of the early background of the subject. The first published 
application of the discriminant function seems to have been the work of Barnard 
(1935 [1]) on craniometry, following the suggestion of It. A. Fisher, Meanwhile 
P. C. Malialanobis (1927, [30]; 1930, [31]) and, m this country, Hotelling (1931, 
[25]) had been concerned with a closely related problem, the construction of 
measures of the “distance” between two sets of multiple measurements, for which 
Karl Pearson’s (1926, [34]) coefficient of racial likeness was not wholly adequate. 
Fisher (1936, [18]) gave a further example of the method and showed (1938, [19]) 
the relation between his work and that of Hotelling (1931, [25]; 1936, [27]). Thus 
the theory of discriminant function analysis proper is about ten years old, but is 
intimately related to researches which go back a few more years. 

A simple problem,: Consider the very simple case of a single measurement, say 
£, which may be made in each of two populations, and let us suppose, for the 
sake of discussion, that £ is normally distributed, with unit variance, in each 
population, but with possibly different means in the two populations. 

Let 


&(€) = a ~ (5 
Ez(£) = a + /3 

be the mean values of { over the two populations, with f3 > 0. As an example, 
we may consider the pli measurements of Iowa soil samples (Cox and Martin, 
[12]), for two soil populations, distinguished by the presence or absence of Azoto- 
bacter. From 100 samples containing Azotobacter and 186 samples containing 
no Azotobacter, We have the estimated averages of pH equal to 7.423 and 6.015 
respectively, with an estimated standard error of .625 within populations (see 
Fig. 1). 

614 
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a = 6 719 
$ = .704 
b = .625 
fi/b = 1.13. 

Let us suppose further that £ is the only measurement available on a single 
individual, not knowing to which of populations 1 and 2 the individual belongs. 

bistributton of pH Measurements 



The problem is to classify this individual as a member of population 1 or popula¬ 
tion 2. It is clear that £ furnishes the only information on which to base a 
decision, and that essentially the only procedure available is to choose a number, 
say & , such that we choose population 1 when £ < £o and population 2 when 
£ > £ 0 . Furthermore, it is evident that the expected accuracy of classification 
depends on the size of 8. If we wish to have equal risks of misclassification for 
members of the two populations we choose £ 0 = a. Then the probability of 
misclassification is given by P{e > /3), where 6 is a normal deviate with unit 
variance. As one would expect, the probability of misclassification tends to 0 as 
P oo and tends to } as p -» 0. In the Azotobacter example, if we assume 
that the estimates given are the population values, we choose £o = 6.719. The 
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ratio @/a — 1.13 is exceeded approximately 13% of the time in sampling from 
the normal distribution, leading to .13 as the probability of miselassification. 

Consider now the slightly more general situation in which wo consider a fixed 
variate, say w with measurements £ distributed, for fixed w, with a mean of the 
form a + fiw This is the standard regression situation. As before assume 
that £ is normally distributed about this mean with unit variance, that is 

£ = ct -j- fiw -j- 6 

where ct and fi are constants, w may take on any or all real values, and e is a 
normal deviate. Note that if ‘w is restricted to take on only two values the 
structure reduces to the first structure considered. An example of the continu¬ 
ous type might be constructed by considering w as genotypic yield of grain and £ 
a phenotypic measure of yield (Smith, [36]) 

The simple problem formulated for the Lwo-population case may be reformu¬ 
lated heie as follows: Given the relationship £ = a + @w 4- e, and given £ for 
an individual for which no other information is known, how shall we estimate io? 
Tor selective breeding the problem may be to select individuals for which w is 
at one end of the scale, rather than to estimate w itself. Whatever decision is 
to be made, it is still clear that £ furnishes the only available information, and 
that the certainty of the decision is a function of fi. Since (£ — at)/p = w + e//3, 
the variance of this estimate of w is 1//3 2 . Note that confidence intervals for w, 
given £, may be constructed from the normally distributed quantity £ — a — /3 uj. 

It should be pointed out that in the usual regression case we are interested in 
predicting £ for given w, with the hypothesis as stated above, whereas in this 
case £ will be observed, and the problem is that of estimating, as a parameter of 
the distribution of £, the fixed variate w. 

Obviously /3 must not .vanish if £ is to perform any discrimination among w 
values. In practice, of course, a and /3 will not be given as known values and the 
variance of e will not be known, but a finite set of observations may be available, 
for which w values are known and £ has been observed The usual analysis of 
variance provides a significance test for the non-vanishing of (3, which is equiv¬ 
alent to testing for the significance of the regression of £ on w. 

It is to bo noted that this analysis reduces to the conventional between-within 
analysis (F or <-test) when we have the special case of two populations. More¬ 
over, if we had treated £ as the fixed variate instead of w, and considered the re¬ 
gression of w on £, the Analysis of Variance would have differed only in replacing 
2(£ — £) 2 throughout by 2(io — to) 2 and the relevnnL F-test would have been un¬ 
changed. 

When probabilities of miselassification are estimated from finite samples, as 
in the soil classification example, there are three sources of error, sampling error 
in the estimate of the separation value £o, sampling error in the estimate of the 
distance between the population means, and sampling error m the estimated 
standard deviation of £ within populations. It does not appear difficult to set 
up confidence intervals for the probability of miselassification, assuming repeated 
classification of individuals given fixed initial samples. 
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2. The one-dimensional discriminant function. We have been dealing so far 
with the simple situation in which only one measurement per individual is 
available for purposes of discrin mation Suppose we still have this measure¬ 
ment, call it , now, but we have other measurements as well, say £ 2 , - • • , , 

As before & = cn + /3u> + . For the moment suppose that the remaining 

measurements have mean values independent of w, so that 

= ct m + , (m = 2, p), 

and let us assume also that the (e„,} are mutually independent, (m = 1, 2, • • ■ , p) 
and are normal deviates with unit variance. It is safe to assume that nobody 
would ever argue, in this case, that the measurements provide 

information about the w value for an individual If, then, we were so fortunate 
that we weie in this situation, and knew so, we could say that h is our dis¬ 
criminant function, since, if any discriminating is to be done, h has to do it. 


TABLE 1 


Analysis of Variance for Regression 



d.f 

Sums of Squares 

Regression 

nil 

r^a - £)» 

Error 

Wat 

(1 - r 2 )2(£ - f)« 

Total 

N - 1 

m - iy 


- l){w - TO) 


Vs(f - £) 2 2(w - w) 2 


Suppose, now that the measurements £i, & , ■ • • , are not explicitly avail¬ 
able, but that we are able to observe a linearly equivalent set Xi , , ■ ■ • , x p , 

related to the {| m } by the transformation 

Xjn Imn £n 

l)sl 


where the l mn are unknown. For fixed w, x m has expected value 

V 

J2 Ln cx n + ImPw = a m + b m w, 
n=l 


so that in general each x m observation provides information about w> More¬ 
over, the Xtn are not in general mutually independent; it is evident that the 
population matrix of variances and covariances for fixed w is given by ow = 

/ 'v Imk^nk • 


As an example of a set of correlated measurements, consider the Azotobacter 
example referred to above. In addition to pH values, determinations of avail- 
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able phosphate content and total nitrogen content were made on soil samples 
in each of the two populations. Means were as follows: 



pH 

Phosphate 

Nitrogen 

Mean of 100 samples with Azotobacter 

7.423 

133.120 

29.400 

Mean of 180 samples without ” 

6.015 

51.113 

21.140 

Mean difference 

1.408 

82.007 

8.260 


Clearly the differences are proportional to the hypothetical b, n ’a. The variance- 
covariance matrix, estimated from the 284 degrees of freedom within populations, 
is given by Table 2. 


TABLE 2 



pH 

Phosphate 

Nitrogen 

pH 

111.0879 

2,292.7192 

198.4026 

284(cr mn ) = Phosphate 


1,042,799.1890 

5,066.2645 

Nitrogen 



29,422.3655 


Estimated correlation coefficients within populations are not large, .213 for pH 
and Phosphate, .110 for pH and Nitrogen, and .029 for Phosphate and Nitrogen. 

Another example is furnished by .Fisher’s Iris measurements [8], provid¬ 
ing sepal length, sepal width, petal length, and petal width for each of 50 
individuals of Iris setosa and 50 individuals of Iris versicolor. This example is 
an unfortunate one in that either petal length or petal width alone is sufficient 
to discriminate the two populations as completely as anybody has a right to 
expect anytime. The petal lengths, for example, vary between 1.0 and 1.9 cm. 
for the 50 setosa, and between 3.0 and 5.1 cm. for the 50 versicolor. 

Let us proceed, under the assumption that available measurements, x m , 
are distributed normally about mean values a m + b m w, with variance covari¬ 
ance matrix <r mn for fixed w, keeping in mind the underlying model of £i, £ 2 , • • ■ , 
£ P , with 

x m — i U £n, ?l = + /3ta + «i, £2 — aa + ; ■ ■ • ; £p = + e p . 

71-1 

The skeptic may wish to grant the first part of our assumptions without grant¬ 
ing the hypothetical structure of £’s underlying the ads. Hotelling’s work [27] 
shows that such an underlying structure of £’s may always be provided, given 
the distribution of ads for fixed w. In other words, a distribution of ads for fixed 
w leads essentially uniquely to an underlying £ model. 

The discriminant function, given &mn j dm and b m , for m, n, = 1, 2, • • • , p, 
is 
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X = 12 <r mil b m x n = J2 Ux„ 


where 


tn = 53 e mn b m , and a nn 

m«=l 

is the reciprocal matrix to v mn . That is are the solutions of the linear sys¬ 
tems [17J 

XT' 

23 = 0 if m = n; m , n, = 1, 2, • • •, p 

2-, = 1; m = 1, ■ ■ •, p. 

That X, as defined above, is properly called the discriminant function will be¬ 
come evident immediately. Putting b m = l mJ §, x n = £ , we have 


x = 0 £ <r mn Ulnkh. 

tn,n,h. 

Recalling that the cr ffln are reciprocal to o- mn = £ , it can be seen that 

k 

12 <r mn l m il nk — 1 if k = 1, and vanishes for k = 1. It follows that 

mn 

X = 0 , 

in other words, X calculated as 53 ^ mn b m x„ from known population quantities 

TO 71 

is proportional to the hypothetical £i, the only one of the underlying measure¬ 
ments which is related to w, thus justifying the term discriminant function for 
X. It is clear that any other linear function of the x’s is also a linear function of 
the £’s, and can discriminate, at best, only as well as X itself, since all the £’s 
are independent of w, with the exception of £i X itself discriminates w to the 
same extent that |i, were it available, would discriminate 
The degree of discrimination of w’s depends, as indicated in the previous sec¬ 
tion, on the ratio of the mean square of £i, among w’s (mean square for regres¬ 
sion), to the mean square of £i for fixed w (mean square for error). Since X 
is proportional to , the same is true when X is substituted for £i. It turns 
out, of course, that X is that linear combination of x’s for which the ratio of the 
mean square for regression to the mean square for error is a n.j\ii i. - n, or, ■ i ;d 
is the same thing, X is that linear combination of X’s whicl ir o a n\m uni 
correlation with w. From any point of view X appears to be the logical function 
of x’s to compute. It is clear that XX is precisely as good as X, if X is any con¬ 
stant. 
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In the two population case, where w takes on only two values, X is evidently 
proportional to — Pva)x n , where n m \ and n ma are the mean values of 

x m in the two populations. X is here the particular linear combination of x’s 
for which the ratio of the mean square between populations to the mean square 
within populations is a maximum. The value of this ratio, which measures the 
degree of discrimination possible, depends on the spread of the means of X 
between the populations, or in general, on the spread of the means of X over some 
given distribution of w’s. Given tr mn and b m the larger the spread of w values 
the better overall discrimination will be obtainable. Oil the other hand, the 
coefficients for X depend only on <r mn and b, n . 

Since X is proportional to & , it follows that the discriminant function is in¬ 
variant under non-singular linear transformation of the x’s, that is, if some set of 
y’s, linearly dependent on the re’s, had been observed, together with their means, 
variances and covariances, the discriminant values would not have changed. 
This invariance is obviously a desirable property, and as such was one of the 
goals of Fisher, .Hotelling, and Mahalanobis. One more property of the dis¬ 
criminant function is of interest; X is essentially equivalent to the maximum 
likelihood estimate of w. 

In our statistical model w plays the role of a fixed variate or population param¬ 
eter, and the x’s have a joint distribution about linear functions of w as means. 
Suppose now that (<r m „) and [b m } are estimated from an analysis of variance 
and covariance on data for which w as well as x values are known. The problem 
of estimating w for a single individual whose x measurements are given resolves 
into a two-stage estimation process, the first stage being the estimation of 
(<r mn ) and [b m \ from the initial data, the second stage being the estimation of w 
by the discriminant function whose coefficients are computed from the es¬ 
timated (o>„n) and (6 m ). It has already been pointed out that X is the linear 
combination of re’s winch has greatest correlation with w It turns out, then, 
that the coefficients of X are proportional to those which would have been ob¬ 
tained from a formal regression analysis of w on xi, x 2 , • • • , x 7 , , considering the 
re’s as independent variables and w as dependent variable, a direct interchange of 
roles as compared with the statistical model we have assumed. Of course two 
linear functions differing only by a factor of proportionality are equivalent in 
discrimination. If the formal analysis of variance is carried out for testing the 
significance of the regression of w on , x 2 , ■ • ■ , x v , the relevant F ratio re¬ 
mains a valid lest for the non-vanishing of the b m m spite of the inversion of 
dependent and independent variables. The* analysis of varianco is given in 
Table 3. 

R is, of course, the conventional multiple correlation coefficient An equiva¬ 
lent analysis can be carried out for X itself, allowing sufficient degrees of freedom 
for the estimation of the constants in X, as given m Table 4 

This analysis is proportional to the analysis given above. It might be noted 
that the mean square corresponding to error sum of squares in this analysis is 
So- mB b m b n , which is X evaluated for x n = t> n , (n = 1,2, • ■ • ,p). 
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In the Azotobacter example, Cox and Martin arrive at a discriminant function 
which has the analysis given in Table 5. 

It is evident that the difference be tween populations is highly significant. The 
choice of scale for X in this case forces the sum of squares within populations to 
be equal to the difference between the mean X values for the two populations 
Thus the mean X differs by .021777 for the two populations, and has an esti- 


TABLE 3 

Analysis of Variance for Regression 



df 

Sums of Squares 

Regression 

V 

R 2 2(u> — w ) 2 

Error 

N — p — 1 

(1 - R 2 )2(w - w) 2 

Total 

N - 1 

2(u> — w) 2 


TABLE 4 


Analysis of Variance for X on w 



df. 

Sums of Squares 

Regression 

P 

R 2 2 (X - xy 

Error 

N-p-1 

(1 - E 2 ) 2 (X - xy 

Total 

N - 1 

2 (X - xy 


TABLE 5 

Analysis of Variance of Discriminant Function 



df. 

Sums of Squares 

Mean Square 

Between populations 
Within populations 

3 

282 

.030842 

.021777 

.01028 

.00007722 

Total 

285 




mated standard error, within populations, equal to V.00007722 - .008 8. 
Half the difference, divided by the standard error is the normal deviate cor¬ 
responding to misclassification, if equal risks are taken. In this case the value 
of the normal deviate is 1.24, approximately, leading to anesOmatedprobabi i y 
of misclassification of about 11, which is not very much better than the .13 
which one would have obtained if pH alone had been used. . .. f 

In this problem, as in conventional regression analysis, it is tempting, fo 
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various reasons, to consider the possibility of using smaller sets of classifying 
measurements. Moreover, a significance test for this situation is in general 
more interesting, as a practical matter, than the significance test for differences 
among populations, since the initial presumption is that we are interested in 
being able to discriminate, on the basis of , *2 , ■ • , x p . Suppose, for ex¬ 
ample, we wish to test whether the discriminant function X based on xi, 
Xt , • • • , x P is significantly better than the discriminant function based on 
Xi , ■ • • , x r , with r < p. The relevant test is precisely the same as the test 

TABLE 6 


Analysis of Variance for Rejecting x T +i, ■ • • , x p 




Sums of Squares 

df 

Si 

Regression on 

Xi, ,X r 

r 

s% 

Regression on 

j 1 ) Xt ) &r+1 > ' * * j Xp 

V 

si - S 2 

Difference 


p — r 

cr2 

Or — 0 p 

Error 


N - p - 1 

si 

Total 


N - 1 


TABLE 7 

Analysis of Variance for X = X a 



Sums of Squares 

df. 

sl 

Regression on X 0 

1 

Si 

Regression on Xi , • ■ ■ , x v 

V 

Sl - sl 

Difference 

p - 1 

Sl - sl 

Error 

N — p — 1 

sl 

Total 

N - 1 


calculated formally from the regression of w on the sets %i , • • • , x r and X\, 
Xi , • • ■ , x P , with the analysis of variance given in Table 6. 

Similarly, if we wish to test for the significance of a theoretical discriminant 
function, A 0 , with preassigned coefficients, as compared with X v , we have 
again the conventional test calculated from the formal analysis of the regression 
of w on Xi , £ 2 , • ■ ■ , x P , as given in Table 7. 

As shown by Fisher [21] the relevant F-Test for this hypothesis is computable 

_n — p + 1 R' % 

p - 1 1 - R' 2 


as 
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where R' 2 = I? 2 (l — r J ), r is the correlation between X and X 0 for fixed w, and 
R is the multiple correlation for w on xi , • • ,x p , or, what is the same thing, the 
correlation of w and X. 

The example of Smith [36] is an example in which the relationships of x’s 
to w have to be estimated from analysis of variance and covariance of data in 
which the w’s are not really known, being related to genotypes. The regression 
of k’s on w is estimated by a generalization of the components-of-variance 
method, from variance-covariance analyses in which the usual null hypotheses 
are significantly contradicted. The net effect is that the usual significance 
tests now fail to hold, although the algebraic calculations are formally equivalent 
to those given above, once the population relations of z’s to w are established. 
When work of this kind is based on small samples, there is some difficulty m 
estimating the reliability of the results 

3. Multi-dimensional discriminant functions. Instead of trying to discrimi¬ 
nate between two populations or estimate a single parameter w, our problem may 
be to discriminate among several populations, not necessarily linearly related, 
or to estimate many independent parameters wi ,w 2 , • • • ,w,. Just as a single 
parameter w is sufficient to distinguish between means of measurements for two 
different populations, s parameters are sufficient to distinguish between means 
Q f s -|- 1 different populations, and exactly s parameters will be required, if 
no linear relation obtains among the s + 1 populations. For example, with 
three populations, any measurement mean may be given the three possible 
values a , « + p, ct + y, corresponding to Wi - w 2 = 0 for population 1, uk = *1, 
m = o for population 2, and vh = 0, w 2 = 1 for population 3. Geometrically 
we have to consider a set of parameter values as a point in an s-dimensional 
qnace 

The one-dimensional discriminant function admits two very different general¬ 
izations in higher dimensions. The practical solution to a particular problem 
for which s is moderately large may involve a mixture of both generalizations. 

Let us generalize our statistical model before discussing the discrimination 
problem To avoid complication of algebraic notation, let us for the moment 
assume s = 2. We will now postulate a set of hypothetical measurements 

fx ,&,•••> & » with 

& = on + + e i 

£ 2 = a 2 + fou + 72V + 62 
£a = «s + 


fp = + e P> 
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where the «„ are independent normal deviates with unit variance, u and v are 
fixed variates or parameters corresponding to the different populations, and 
ai, a*, ■ ■ , a p , fii, fit, yi, and 72 are constants, Evidently fa , - • • , fa can 
yield no information about u and v, fa and fa together contain all the information 
there is to get about u and v. As before, assume that our data will be m the form 
of linear combinations x m = Xl mn fa , with unknown coefficients Z m „ . The 
variance-covariance matrix within populations, or for fixed u, v, is still given by 
cr mn = 2 l m i,lnk ■ The mean values of the x’s for fixed u, are given by 

E(x m ) — 2 Z mn a„ -j- (l m 20l + lmlPi)u 4" (ZmlYl 4~ lmiyi)v 

' Am bmll -(— CmV* 

This model is again justifiable on the basis of Hotelling’s work. 

The first question to ask is whether we can now form two linear combinations 
of the x’s and get rid of fa , ■ • • , fa in both, thus providing a two dimensional 
description of an individual on the basis of Xi , , • ■ • , x p . The answer here 

is in the affirmative, as a result of a direct generalization of the method dis¬ 
cussed earlier. If we calculate Ai = 'Sc mn b m x n and X 2 = 2 a mn c m x n , we are 
fortunate enough to get 

X\ = fa + fafa 
Xi = yifa + Vifa 

with no disturbing elements from fa, • • • , fa . Assuming for now that X x and 
Xi are not merely proportional, i.e. /3iy 2 — fayi?* 0 , what do we do with X, and 
Xi ? 

For fixed u, v, we have 

E(Xi) = 2o- mn b m a n + u~SiT mn bmbn + v2a mn b m c n 
— Ai -f- B\U -(- CiV 

E{Xi) = 2 <r mn C m a n + u2<r mn c m b„ + 1/2 a mn c m c n 
= Aj -)- BiU *j- GiV 
and variances and covariance 

tii = 2 <r mn b m bn = B\ 

Tu = 2 <r mn b m c n — C\ — Bi 

T 2 J = 2 cr Ctn^n = Gi . 

We may for example, estimate u and v by solving the equations 

B\U 4- C]V = Xj_ — Ai 
BiU 4“ G 2 v — Xi — A 2 , 
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or we may set up regions in the Xi, X 2 plane for which certain decisions are 
made. For example, when classifying an individual into one of three popula¬ 
tions, we might delineate regions, as in Fig. 2 . 

11 lien the particular individual would be classified as coming from population I, 
II, or III, according to which region Xi, X 2 falls in. The individual points 
shown 111 the figure represent the expected values of Xi, X 2 for each of the three 
populations. No exhaustive investigation has been made for this situation, but 
some fairly obvious methods are available for constructing such regions 
With respect to significance tests when the <r nn , a m , b m , c m are estimated from 
samples, the whole gamut of multivariate analysis has to be run Tests ana¬ 
logous to (but more complicated than) F tests exist for testing the significance 

Classification Pcqfons in 2bSh Plane. 



of the discrimination, the significance of a subset of the x’s, and the significance 
of a theoretical pair Xi, 0 , X 2 , 0 (Wilks [41], [42], [43]) 

For some purposes a two-dimensional chscrimant function X 1 , X% may be 
unsatisfactory. For example, we might suspect that /?i 72 = daYi (or that the 
relationship is nearly satisfied). Under these circumstances Xi is (nearly) 
proportional to X 2 , and we would like to compute the best one-dimensional 
discriminant function, even though we have started with two linear parameters 
u and v. Even if dr /2 ^ foYi we might still ask for the best one-dimensional dis¬ 
criminant function, in order to rank our populations on the “best linear scale- 
If avo define Y as that linear combination of xi, a; 2 , ■ , x p which has the largest 

multiple correlation with u and v, we have generalized the simple one-dimen¬ 
sional discriminant function in a second direction. 

Before proceeding, it is useful to recognize that Y , as defined above, must be a 
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function of Xi, Xa, since Xi and X 2 together contain all the information about 
u and v that can be obtained from the x’s. 

Now suppose we consider an arbitrary linear combination Y = A 1 X 1 + X 2 X 2 . 
F correlates best with 


Ai(th'U + Tltf)) + A2(ri2R + T2 2 a) = (XlT u + X 2 Tj 2 )w + (A2T12 + X 2 T 2 2)ll. 

We now have to choose Xj and X 2 to maximize this correlation. This correla¬ 
tion will bo maximized if we maximize the ratio of the variance of 

(Aim + ~KiTn)u -j- (Aim + X 2 1*22)11 

(over the distribution of u and v values) to the variance of F for fixed u and v. 
Call the first quantity Si , the second S 2 . Then & = Xx*m + 2X 1 X 2 m + X 2 2 r 2 2 
and Si is of the form Ai /in -f- 2 X 1 X 2/112 -1- X 2 /i 22 where 

/111 = TnCTuu + 2tiiTi2!Tuc -f- T( 2 V„„ 

M12 = ruri 2 «r uu + (ti 2 2 + TnmVtli) -j- Ti2T22°*«v 

y-n = Tl 2 V t ,u + 2 ti 2 T220uv 4" Tn<Tn • 

Maximizing Sl/S? leads to the equations: 


i.e. 


At m + X2T12 — gr (X1M11 + X2/112) 

Xl Ti2 + X2 T22 = ~cT (Xj H12 + X2 /I22) 

(i>2 


Xi(m — Onii) + X 2 (ti 2 — 6/112) = 0 


Xi(ri 2 — 6 / 112 ) + X 2 (t 2 2 — 6 / 122 ) = 0 , with 6 = Si/Si . 

It is thus seen that 6 must satisfy the quadratic equation 

(m — 6 /ui)(t22 — 6/i 22 ) — (ri2 — 6/U2) 2 = 0, 


in order for solutions Xi, X 2 to exist. In general there will be two solutions, of 
which the greater corresponds to that linear combination A 1 X 1 + X 2 X 2 which has 
greatest multiple correlation with u and v, whereas the smaller corresponds to 
that linear combination which has least multiple correlation with u and v, 
6 itself corresponds to R 2 /(l — R 2 ) for the regression of A 1 X 1 + X 2 X 2 on u, v. 

In the general case with s degrees of freedom corresponding to Wi, to 2 , • ■ • ,w,, 
there is an s-dimensional discriminant function (Xi, X 2 , • • • , X.), and a sot of 
a linear combinations for which R 2 / (1 — R 2 ) is stationary with respect to 


Xi, * • • > X.. 

The s roots (corresponding to an equation of degree s) arranged in decreasing 
order, permit construction of the best one-dimensional, two-dimensional, ■ ■ ■ , 
(s — 1)-dimensional discriminant functions. 



DISCRIMINANT FUNCTIONS 


527 


Discussion of the relevant significance tests for these reduced discriminant 
functions is beyond the scope of this paper. Reference may be made to the 
work of Hotelling and Fisher. 
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NON-PARAMETRIC ESTIMATION II. STATISTICALLY EQUIVALENT 
BLOCKS AND TOLERANCE REGIONS—THE CONTINUOUS CASE 

By John W. Tukey 
Princeton University 

1. Summary. Wald [2, 1943] extended the usefulness of tolerance limits to 
the simplest multi-dimensional cases Ilis principle is here used to provide 
many new ways of using a sample of n to divide the range of the population into 
n + 1 blocks of known behavior. The exact tolerance distribution for the 
proportions of the population covered by these blocks is extended from the case 
of a continuous probability density function to the case of a continuous cumula¬ 
tive distribution function. Such an extension is needed in dealing completely 
with multivariate cases even where the underlying distribution is as smooth as a 
multivariate normal distribution. 

The devices used m Paper I [1] to extend the usefulness of tolerance limits to 
the case of a discontinuous underlying distribution will be applied in the next 
paper of this series, with some extension, to extend the usefulness of these gen¬ 
eral tolerance regions to the case of a discontinuous distribution. Some of these 
results specialize into new results for the univariate case, although they do not 
seem to have any immediate practical application. 

The author wishes to acknowledge the stimulation given to his work on this 
problem by Henry Scheffd, whose modesty has kept this paper from the joint 
authorship of papers I [1, Scheffd and Tukey 1945] and IV (not yet written). 

2. Introduction. Wald’s great contribution to the theory of tolerance limits 
was his method of successive elimination. As originally presented for a bi¬ 
variate situation it ran roughly as follows Let (%i , yd, (aia, J/a), ■ • ■ , (*« , y n ) 
be a sample of n from an arbitrary bivariate population The type of tolerance 
region to be used is determined by four preassigned integers, fa, fa , fa , and 
k t . The procedure is as follows: Order the n observations according to their x 
values. Select the fa highest, and let the £ coordinate of the lowest of these fa 
be x u . Select the fa lowest, and let the x coordinate of the highest of these 
h be xi . Discard these fa + fa selected observations, and order the remaining 
n _ / Cl _ / C2 observations according to their y values. Select the fa highest of 
these remaining observations, and let the y coordinate of the lowest of these fa 
be y u . Select the fa lowest of these remaining observations, and let the y 
coordinate of the highest of these fa be y t . The tolerance region, consisting 
of all points (*, y ), with *, < x < x u and y, < y < y» depends on the sample, 
and, hence, so does the fraction of the population falling m (= covered by) this 
region Wald showed that the distribution of this fraction covered was in¬ 
dependent of the underlying bivariate distribution, so long as this latter dis¬ 
tribution had a continuous probability density function. He showed that the 
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distribution was the same as that arising in the onc-climensional case when a 
tolerance region was set with the aid of fa + fa + fa + fa observations. (Nu¬ 
merical approximation to these distributions will be discussed in Paper IV of this 
series. 

The important device in this process, and the one which makes the conclusion 
possible, is the discarding of the fa + fa observations after they have played 
their part by determining Xi and x u . 

Wc shall shortly be able to describe this procedure of Wald’s as a special case 
of a more general procedure, but we shall first go back to the simplest one dimen¬ 
sional case to explain some of our notions and terminology. 

Consider the uniform distribution from 0 to 1, draw a sample of n, and let the 
sample values, ordered according to size be t\ , 4 , • • • , 4 . These n values di¬ 
vide the interval from 0 to 1 into the following n -f- 1 parts (0, fi), (fi, 7 2 ), • • • , 
(4-i4), (4, 1) which we shall call Hooks. Since the joint distribution of the 
U is well known, that of the lengths of those n — 1 blocks is easily found. This 
distribution of lengths would be unimportant, if it were not at the same time the 
distribution of the fractions of the population covered by the blocks. As is 
shown later, this distribution of fractions covered, or, more simply, of coverages, 
has the following properties: 

(i) the fractions covered add up to 1, 

(n) the distribution is completely symmetrical. 

Property (ii) makes intuitive the result of Wilks [3, 1941] that the distributions 
of the coverage of regions obtained 

(a) by removing the fa + fa left-most blocks, 

(b) by removing the fa left-most and the fa right-most blocks 
are identical. The specific distribution obtained satisfies 

(iii) if the coverages are taken as barycentric coordinates on an n-simplex, 
the distribution over the simplex is uniform, 

(iv) the sum of the coverages of any k preselected blocks of the n + 1 has 
the well-known distribution 

Pr {sum of k coverages < t] = 7, (n — k + 1, 7c) 

where Ip (n, in) is the incomplete Beta function. 

We shall call a set of blocks, derived from a sample, whose coverages behave in 
this general way a set of statistically equivalent blocks. Normally this will be 
abbreviated to se-blocks. (A precise definition is given in section 4.) 

We shall concentrate much of our attention on all the blocks and their sym¬ 
metrical character, rather than on the tolerance region formed by deleting 7c 
of them, since our results will then be applicable to many other problems. 

Now wc can generalize Wald’s original procedure. Let W\ , Wt , • ■ ■ , W„ 
be a sample of n —we shall not need to consider its distribution-—and let (pi, 
<pt, • • • , <p n be n numerically valued functions of W, possibly alike, possibly 
distinct, such that vi{W), ^(TT 7 ), ■ • ■ , <p n {W) have a joint distribution. Proceed 
as follows: 
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Order the W, according to the numbers J, select the W t for which 
is laigest and denote it by W ,(i) . The first block contains all W such that 

(2da) Vl {W) > 

Discarding W, m , order the remaining W, according to the values of <p 2 (WJ, 
and select as the one giving the largest value The second block contains 
all W such that 


(2.1b) 

Continue this process. 
(2.1m) 


<piOF) < <pi(lFj ( i)), 

<piW) > <pi{W z q)). 

The wth block, for m < n will be defined by 

Vi(W) < ipjCW^y), j = 1, 2, • • • , m 

<p m (W) > v> m (TF l(m) ), 


and the (n + l)st block by 


1, 


(2.1n) soj(IF) < <p,(TF l( n), j = 1, 2, • ■ • , n. 

(A graphical example of this construction is given shortly.) This set of n -f 1 
blocks will be statistically equivalent whenever the cumulative distribution of 
each ip, function is continuous 

1 To specialize this to the case described above, let W be a pair ( x , y) of numbers 
and let 

(i) the first hip’s be the ^-coordinate of W, 

(ii) the next h <p’s be minus the ^-coordinate of W, 

(hi) the next h<p’s be the //-coordinate of W, 

(iv) the next hip’s be minus the //-coordinate of W, 

(v) the remaining <p’s be arbitrary. 

Then the first h blocks will contain all W for which 


x = <p 3 (W) > <Pj(W,u)), j — li 2, ••• , h 


that is, for which 


x ^ x u — ^ic 1 (lF"i(fc 1 )). 


Similarly, the next h + h + h blocks will contain all W with 


x < Xi , 

y > Vu, xi < x < x u , 

y < Vi, Xi < x < x u , 

respectively, and the removal of these h + h + h + h blocks leaves Wald’s 
tolerance region (plus the boundaries where x = x u ,x = x t ,y = y» , y = Vi)- 
There would be no point in this more general wording, if it did not include 
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new cases of some interest. We give now, in graphic terms, an example of such 
a case. 

We deal with a sample of n bivariate observations, which we think of as plotted 
on a map so that we can use geographical language. The number n is rather 
large, and we wish to construct a tolerance region by deleting 12 blocks. We 
proceed as follows: 

Fmcl the most northerly point, draw an East-West line through it, and shade 
the area North of the line. Find the most easterly point in the unshaded area, 
'V w a North-South line through it, and shade the unshaded area East of the 



Fig. 1 

ine. Find the most southerly point, (always working in the unshaded area), 
draw an East-West line through it and shade the area South of the line. Find 
the most westerly point, draw a North-South line through it, and shade the area 
West of the line, Find the most northeasterly point, draw a NW-SE line through 
it and shade the area northeast of the line. Find the most southeasterly point, 
draw a NE-SW line through it, and shade the area southeast of the line. Repeat 
this 6 times more, choosing in succession the most southwesterly, northwesterly, 
northerly, easterly, southerly, and westerly points, The remaining points will 
now lie in an unshaded area surrounded by a polygon, which will have 8 (or 
perhaps fewer) sides. The inside of this polygon is the desired tolerance region. 
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Figure 1 shows the final result, starting from n = 25. The practicing statis¬ 
tician is invited to try an example of Ins own with n at least 100 

Other newly accessible cases can easily be invented by the reader, after he con¬ 
siders this example carefully. 

The use of a single W and n functions <p t has two virtues; it simplifies nota¬ 
tion and frees the intuition, as compared with the use of n chance quantities 
= w(W p. _ 

If the bivariate situation above were regarded as a 12-variate situation, where 
the variates were, in order, (y, x, -y, -x, x + y, x ~ y, - * - y, - x + y, 
V, x, — y, - x ) then the original Wald procedure with h = h = • • • = = 

1; hi = ki = ■ • ■ = fc 24 = 0 would apply to construct the same region. Yet 
even if x and y had a bivariate normal distribution, Wald’s proof would not 
apply without extension For the 12-dimensional distribution is highly singular 
(it is concentrated on a 2-dimensional plane in 12-dimensional space) and there 
is no hope of a density function. An extension of Wald’s result to the case 
where the 12-dimensional joint cumulative distribution function is continuous 
—as is the case in this example when x and y have a continuous joint cumulative 
—is clearly needed. 

When wc come to deal with the case of where the cumulative needs not be 
continuous we shall meet a further difficulty, namely “ties”. But if, as in the 
present case, the cumulative is continuous, it is easy to see that the probability 
that <f>t(Wj) = for any i, j, k is zero. 

3. Terminology and notation. A quantity which has a probability distribu¬ 
tion we call a chance quantity (it has frequently been called a random variable ). 

„ The term chance quantity does not imply that its values are single real numbers, 
they may be single real numbers (when we also speak of a real chance quantity), 
sets of n real numbers, or more general objects. The cumulative distribution 
function, or cumulative, of a single real chance quantity, X, is defined by 

F(t) = Pr{X < <}, 

except perhaps at the discontinuities of F. We have used here the notation 
Pr{k(X) | to indicate the probability that k(X) holds, and we have followed our 
policy of using capital letters for chance quantities and the corresponding 
lower case letters for their values. 

The set of values of W, or, as we shall say, the TF-set, for which, for example 
<p(W) < 3, will be denoted by 

iW\v(W) <3}. 

We shall wish to compute probabilities associated with one or more functions 
of a chance quantity; usually we will emphasize that these functions shall be 
measurable with respect to the probability measure underlying the distribution 
of W by asserting that they have a joint cumulative, which is defined by 

F(t i, h , • ■ • , th) = Pr{<Pk{W) < h}, 
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(except possibly at discontinuities of F ) and which does not exist unless the <p t 
are measurable with respect to the unknown underlying distribution of W. In 
cases where we neglect to remind the reader, it is still assumed that the functions 
are measurable. 

The coverage of a IF-set, which may itself be a chance quantity, is defined by 
Coverage of S = Pr [W e S}. 

When S is a chance quantity, its coverage is also a chance quantity. The 
barycentric simplex (of dimension n) is the set of points in n + J -dimensional 
Euclidean space {k , k , • • ■ , 4+i) with h + U + ■ ■ ■ + h,+i = 1 and 0 < t, <Jl. 
The name comes from the representation of the point (k , k , ■ , h.+i) as the 

center of gravity (in mechanical terms) or mean (in statistical terms) of the dis¬ 
tribution where a fraction i< is concentrated at the rth vertex. (In order, the 
vertices are (1, 0, 0, ■ • ■ , 0), (0, 1, 0, • • • , 0), etc.) The uniform distribution 
on this simplex has an (n-dimensional) density 

n'dtfik ••• di n , (0 < k , k , • • ■ , t n , 1 - k — k ■ • • — k> < 1 ), 

and the cumulative 


T(xi , Xi , • • • , £n+l) 


n\ 


11-1 


dk dh • ■ ■ dt n 


where the integration is over the range where 0 < U < x t and at the same time 
k + k + • * * + hi —i < 1. 


4. The blocks determined by n values of W. We deal now with a population 
of IF’s (a probability measure a on the space T = {w}), a family of functions 
<Pi, <pi i " ■ > <fim of IF with a joint cumulative (measurable with respect toju) 
and a set of values w y , w 2 , • * • , w n , (w, e T). 

(4.1) Definition The set wi, w 2 , ■ • • ,w n and the functions <pi, <p 2 , • • • , <p m 
define blocks as follows: 

(4.2) Si = [w | <pi{w) > ai} 
where a l = max <pi(w t ) = v=i(w,u)), which defines i{\). 

i 

(4.3) Si <= {w | <p\(w) < ai , <p 2 (w) > a 2 ), 

where a 2 = max <p 2 (wl) = tp 2 {w t ^)), i( 2) ^ z(l), which defines i(2). And in gen- 
eral, for 1 < k < min (m, n), 

(4.4) Su = {w | <pi(w) < oi, ■ ■ • , n-i(w) < x, <p k (w) > a k ], 
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where dk — max <p/ c (w,) — <Pk(w,(kj), the maximum being taken over all i except 

*(l), *'(2), ■ • ■ , i(Jc — 1); and i(k) being chosen distinct from all i(j), j < k. 

If m > n, then 

(4.5) S n +1 = {w | (pfw) < Ci, • • • , ip n (w) < a„J. 

If m < n, then 

(4.6) (5m| n+l = {w I <pi(w) < ai , • • • , <p m (w) < a n ]. 

.The result of this definition is to use w x , ■ ■ - , w„ and &,•••, to define 
n + 1 blocks (one more than there are id’s) m case there are enough functions, 
and, in case there are not enough functions, to define one small block, S ,, for 
each function plus one large remainder iS m |n+i. We notice 

(4 2) Remark. The blocks of (4.1) are well defined unless pfioj) = <f>,(wk ) for 
someijj, k. 

6. Statement of results for the statistician, The central results can be stated 
as follows: 

(5.1) Theorem A m \ n +i . If Wi, W 2 , ■ • ■ , W n are a sample of n from a dis¬ 
tribution, if ipi, <f>i, • •, <p„ , (m < n), are m functions such that 

<Pi{W), <ps(W), • • • 

have a joint distribution which has a continuous cumulative, and if the blocks 
Si, St, ••• , S m and S OT | n+ i are defined as in (4.1), then 

(i) the blocks are disjoint chance sets, uniquely defined with probability one, 

(ii) the distribution of the coverages 

d = Pr\w in £,}, i = 1, 2, • • • , m 

and 

c m)n+ i = Pr {w in S m |„+i} 

is the same as that of t,, h ,■••, t m and t m +1 -)- t m +i 4" ''' H" tm+i where l, 
are uniformly distributed on the barycentnc simplex with n + 1 vertices. 
Conditions (5.1i) and (5.1ii) are the precise definition of a partial family of 
statistically equivalent blocks of type n + 1 and an associated (m\n+ 1) tolerance 
region. 

(5.2) Theorem B n+1 . If Wi , W %, • • • , W n are a sample of n from a distribu¬ 
tion, and if <pi, <Pi, , vm , (m > n), are m functions such that 

«(W, wW, , <p m (W) 

have a joint distribution which has a continuous cumulative, and if the blocks 
S x , St, • • ■ , S n +x are defined as in (4.1), then 

(i) the blocks are disjoint chance sets, defined with probability one. 
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(ii) the distribution of the coverages 

c, = Pr {w in £.}, < = 1,2, ■■■,» + 1 

is the same as that of t x ,t s , • ■ • , t u+1 , where the i, are uniformly distributed 
on the baryccntric simplex with n + 1 vertices. 

Conditions (5.2i) and (5.2ii) are the precise definition of a complete family of 
statistically equivalent blocks. In Paper III we shall have to widen these notions 
a little, and this form will then lie qualified by the phrase “in the narrow sense”. 

6. Statement of results for the measure theorist. The construction of (4.1) 
maps the product T" X U n into E n+1 where T is the set. of w’s (and hence T n is 
the set of ordered n-tuples of w's), U is the space of all real-valued functions 
defined over T, measurable with respect to a fixed probability measure p, and 
possessing a continuous cumulative, (i.e. ju({«i | <p(w) = c}) = 0 for all real c), 
and hence U n is the space of ordered n-tuples of such functions, and E n+1 is 
Euclidean n-dimcnsional space. More precisely, the mapping is into the bary- 
centrio simplex with n + 1 vortices, a subset of E n+1 , and is well defined except 
for a set in T n of measure zero with respect to p n , the power measure of p. In 
these terms, we may restate theorem B as follows: 

(6.1) Theorem B n+l . Hold the n functions vi, V 2 , • • • ,<p n and the probability 
measure fixed, then T n is mapped into B n and the power measure p n is carried by 
that mapping into a measure on B n . This measure is always n I times Lebesgue 
measure. 

7. Wald’s principle. The essential principle behind Wald’s process of dis¬ 
carding observations is sufficiently fundamental to warrant a name of its own. 
It can be stated^ quite generally, in the two following forms: 

(7.1) Wald’s Principle, (discrete form) Let W be a chance quantity , 

and consider samples of n. Fix disjoint w-sels Ai, Az, • • • , A m , B. Consider 
those samples of n for which exactly one value falls in each A, and the remaining 
n-m fall m B. The distribution of the n-m falling in B is that of a random sample 
of n-m from the distribution of W restricted to B . (i.e. /uu(-X^) = [/i(2?)] -1 m(-B-20-) 

(7 2) Wald’s Principle (conditional form.) Let W be a chance quantity, 
and <p a function such that each value of < p(W) has probability zero. Consider 
samples of n. Then the conditional distribution of the w,, given that 

max <p(wi) = a, 

is that of one Ww with <p(Wii) = a and a sample of n — 1 other W{ from the distribu¬ 
tion of W restricted to B = {vj \ <p(w) < a}. 

(7.3) Central Lemma. Let W be a chance quantity and let <py, • • ■ , <p n be 
functions with a joint cumulative such that <p,(ui) = a has probability zero for each 
i and a (i.e. the joint cumulative is continuous ). Then the conditional distnbu- 



non-parameteic ESTIMATION 


537 


tion of the remaining n — k w’s, after k blocks have been chosen according to (4.1) 
is that of a sample from the distribution of W restricted to 

B = {w | <pi(w) < oi, • • • , <p k (w) < a k \, 
where k = 1, 2, - • • , n. 

The proofs of these statements are elementary and direct. To establish (7.1) 
we have only to show that given two sets in B n ~ h , their probabilities on the 
assumption that one mi, is in each Ai are m the ratio of their probabilities for an 
unrestricted sample oi n—k But the probability of finding the n — k w, in a 
set 22, contained in B"~ h , and one w l in each A ,, is exactly 

times the probability that n — k w,, known to be in B n ~\ will fall in R. This 
establishes (7.1). 

In order to prove (7.2) we must show that the probability of a set R of n- 
tuplcs Wi, ws, • • • , w„ is the same whether calculated directly or calculated by 
the proposed conditional distribution. To this end, it is natural to decompose 
R as follows: 

R = 72(1) + R( 2) + • • • + R(n) + Z, 

where 22(f) contains those (wi, • • , u>„) in R for which ip(w t ) > cp{wf) for all 
g 5 ^ f, and Z contains the remaining (wi, ■ ■ ■ , w n ), which must involve at least 
one tie <p(w,) = <p(w k ), j ^ k Since Z has probability zero, it will suffice to 
establish the equality of the two calculations for sets of the form 22(f), and be¬ 
cause of symmetry we may restrict ourselves to sets of the form 22(1). 

Given an integer N, we decompose the range of <p{w) into Nn segments of equal 
probability, which we may do because the cumulative of <p is continuous. There 
are then Nn values b k , {bo = — °°, b. Vn = + 00 ) such that 

Pr {ht_i < <p(w) < = l/Nn. 

We now decompose our set 22 (which is of the form 22(1) as follows: 

22 = 22a -h Rrfn H" Y, 


where 22* contains those n-tuples 

(wi, • ■ ■ , w n ) for which 6*_i < p(wi) < bk 

and v (w t ) < b k -i for all f > 1. The remaining set Y contains n-tuples where 
the two largest r(w.), (f = 1 and f = O, belong to the same interval. The 
probability of this is less than 


n 


.(n - 1) (±Y < J_ 
2 \nN/ ~ 2 N* 



538 


JOHN W. TTJK.T5Y 


as calculated from the known distribution. Calculating from the conditional 
distribution, we find immediately a bound of 



where A * is a constant depending only on n. Thus, as N increases, the prob¬ 
ability of the successive sets Y tend to zero—calculated either way. To show 
the equivalence of the two calculations it is now sufiicient to show that they 
agree for the sets 14 . But this is a case of (7,1) and the lemma is proved. 

Now (7.3) follows by induction, applying (7.2) at each step. 

8. Proof of theorems. We notice that Theorem B n is equivalent to Theorem 
A m |n+i, since, according to (4.1) = S„ H . 

We have only to prove theorem A m(n+1 , which wo do by induction on m. 
For m = 1, it is exactly Wilks’ [3, 1911] original one-dimensional theorem, and 
is known. Lot us assume it for m — k and demonstrate it for m = k -f- 1, for 
by induction this will complete the proof. 

We must deal with the blocks Si, S s Sk, S*+i and 5'*+i|n+i, (notation 
as in (4.1) and (5,1)). We need the obvious 

(8.1) Lemma. Since the cumulative of tp^-i is continuous, the union of Sk +i 
and &-u|m+i differs from <S),|»hi by a set of zero probability. 

Hence _ , 

C*|n-tl = C * +1 -p C*+X|„+1 . 

Since we know from the induction hypothesis that ex, Ca, • ■ • , c* and c*|„+i 
have the correct joint distribution, we have only to show that c*+i and c x , 
a , ■ • • , cii have the correct joint distribution. Fix cj , c*, • • • , c* . Then 
Ox, aj, • • * , o* must be fixed, and so (7.3) applies to the n—k w <’s not dis¬ 
carded after Oj, a 2 , • • • , a k have been fixed. The conditional distribution of 
c* +l must be that of a fixed number (1 — ci — Cj — • • • — c k ), which is the 
probability attached to Si.\ n + 1 , times the coverage of one block based on a sample 
of n—k, since the remaining n~k w’s behave like a sample. 

Consider the very particular case where w is uniformly distributed between 
zero and one and <pi(w ) = w, all that we have said in the last paragraph applies 
— the conditional distribution of C/.- ( i given c t , c 3 , • • , c* is the same in the two 
cases—hence the joint distribution of c x , c 2 , • • ■ , c* , c*+i is the same in both 
cases—but in this very particular case the joint distribution is known to be 
that required by theorem Ai+i\ n+ i. 
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SOME BASIC THEOREMS FOR DEVELOPING TESTS OF FIT FOR 
THE CASE OF THE NON-PARAMETRIC PROBABILITY 
DISTRIBUTION FUNCTION, I 

By Bradford F. Kimball 
State Department of Public Scruice, New York, N. Y. 

1. Summary. In developing teats of fit based upon a sample 0„(xi) in the 
case that the cumulative distribution function F(X) of the universe of X’s is 
not necessarily a function of a finite number of specific parameters—sometimes 
known as the non-parametric case—it has been pointed out by several writers 
that the "probability integral transformation” is a useful device (cf. [1]—[4]). 

The author finds that a modification of this approach is more effective. This 
modification is to use a transformation of ordered sample values x> from a random 
sample O n (x t ) based on successive differences of the cdf values F(xi). 

A theorem is proved giving a simple formula for the expected values of the 
products of powers of those differences, where all differences from 1 to n + 1 are 
involved in a symmetrical manner. 

The moment generating function of the test function defined as the sum of to 
B quares of these successive differences is developed and the application of such 
a test function is briefly discussed. 

2. Introduction. Let the sample values x, be ordered so that 

(2.1) x, 5 x t+ i, (i = 1, 2, • • • , n - 1). 

Let Ft denote the value of the cdf F(X) associated with the rth ordered sample 
value x r , Thus 

(2.2) F r = F(x r ). 

Consider the following transformation of the ordered sample values x,- based 
upon the (hypothetically) known cumulative distribution function F{X) which 
will be taken as a continuous function of X over its admissible range: 

Ui = F t , 

(2.3) Mr = F r - Ft- 1 , (r = 2, 3, • • • , n) 

Un (-1 ~ 1 F n . 

The restrictions on F { are that 
(2 4) F t ^ F t +i, and 0 g P, ^ 1 

The ajpove transformation (2.3) translates these conditions into the symmetrical 
conditions 

(2 5) 0 ^ Ut , and Ui + u 2 + ■ • • + u n -f u„+i = 1. 

A one-to-one correspondence between u, and F t exists if one of the u, be omit¬ 
ted,—say Up . With up omitted, the Jacobian of the transformation from F t to u, 

540 
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has value unity. The probability density of the sample OJx t ), with x, ordered, 
is given by 

(2.6) P\O n {x t )} dO n = n\ dF x dF, ■ •. dF n , 

Hence with 14,9 omitted, 

( 2 . 7 ) ,■)] dOn = i'll diii cfais * * * dup—i dup^. x • * ■ dvL nPX , 

The sample space of the w, with up omitted, is that portion of the n + 1 
Euclidean space of all the «, variables, bounded by the coordinate hyperplanes, 
which is on the projection of the hyperplane (2.5) upon the hyperplane up = 0. 
This is a region in the n-space of the Ui with up omitted, bounded by the coor¬ 
dinate hyperplanes and the hyperplane 

(2.8) ui + Ui + ■ • • + Up-1 + «(3+.i + • • ■ + Wn + Wn+l = 1. 

Thus the formal integral of the pdf of the u, over sample space is 

(2.9) n\ j" n J du x ■ ■ ■ du, j_i du p+x • • • du n+i = 1 

with 0 |i«i, and m bounded above by the hyperplane ( 2 . 8 ). 

It is now clear that both the pdf and the sample space of the «, (with up 
omitted) are symmetrical in the u, . This fact leads to complete symmetry of 
the joint distribution function of any set of w,, over i = 1 to n + 1 including Up, 
relative to the w, selected. Other interesting results are forthcoming. 


3. Basic mathematical theorem. Using the techniques associated with the 
Beta function, the expectation of the products of powers is found to be 

E[u?-u\ •<■••] 

^ = r(n+ l)r(p+ l)r(g+ l)r(w+ D ••• /V(n + p + q + w + ■ • ■ + 1) 

where r s, t, etc,, are any set of different indices (for the present other than 0) 
from the integers 1 to n + 1, and p, q, w, etc., are any real numbers greater than 
minus one. The relation (3.1) can further be generalized to the case where up 
may be included. This will he proved for the case n = 2, with p, q and w 
taken as integers. The generalization can he concluded from inspection. Thus 

with 

■U 3 = 1 — Ui ~ Uli 


E[uf ] 


i»l |*1—• 

21 J otj dui | ui( 1 - «i - mj)“ du i 

2\f l vS(l ~U2 ) p ^ h j[V(l -«)’* 

f\ia ~ U2) p+v,+1 du, = 
(p -J- xo 4- 1)1 do 


2!plg’u>! 

(p + 2 + w + 2 )* 


Hence the theorem: 



542 


BRADFORD F, KIMBALL 


Theoeem. Given a random sample of n values of X from a universe with cdf 
F(X) which is continuous over the range of X. With the sample values x t ordered 
so that x, 5S x l+ i define a set of n + 1 variables it, as the successive differences of 
F(xf) by the relations (2.3) . The expected value of the product of i cal powers greater 
than minus one of any or all of the u,, (i = 1, 2, ■ • ■ , n + 1), is given by the rela¬ 
tion (3.1) above (not subject to the omission of uf). 

There arc many interesting consequences of this theorem. Perhaps the most 
striking is the following: 

Cobollaby 1. Let a range a(m , k) for positive integer m be defined by 

(3.2) *(m, k) = F(x k+m ) - F(x k ) 

with k = 0, 1, 2, ■ ■ • , n, and m S n + 1 — k 

under the convention 

F(x 0 ) = 0, F(x n+ 1 ) = 1. 

The probability distnbution of a (to, k) is independent of k and hence is the same as 
that of F(x m ). 

Another interesting consequence (not new) is the following: 

Cobollaby 2, The correlation of it, and Uk , i X k, is the same for all pairs 
(i, k) over the range of indices from 1 to n + 1, and has the value — 1/n. 
Introducing the notation 

(3.3) [n + »■]>■ ~ (« + r)(n -+• r - 1) • • • (n -f 1), 
the corollary follows from the relationships 

E( Ui ) = l/(n + 1), F(ul) = 2 /[n + 2]*, E(um) = 1 /[n + 2] 2 . 

The fact that the correlation between any two frequency differences w,- and m* 
is negative leads to the following more general relationship: 

Coeollahy 3. For any set of different indices i, j, k, etc., and for any positive 
numbers p, q, r, etc., the expectation of the product of the powers p, q, r, • ■ • of 
Ui , Uj , Uk is less than the product of the expectations of the powers taken 
separately: 

(3.4) E[uf E(uf) ■ E(u]) • E(ul) 

This follows from generalization of the relation 

r(n + i)r(p + i)r( g + i)r(r + 1 ) 

r (n + 2 > + <Z + r-bl) 

[T(n + i)] 3 r(p + 3)r(g + i)r(r + 1) 
r(n + P + l)r(n + q + l)r(n + r + 1) ’ 

The above theorem suggests the possibility of test fimetions for fitted distribu¬ 
tions, relative to a universe with a cdf which, since it is merely conditioned by a 
sufficient hypothesis for the theorem, may be of the non-parametric type. 
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A teat function of the form 

(^•^ ^ = £ u ?> V real and positive 

TO 

inight; first come to mind. If p = 1, compensatory effects of deviations reduce 
the efficiency of the test function. One is thus led first to consider the teat 
function (3.5) for the ease p = 2. 

4. The moments of the probability distribution of y m — s u\. We are 
first concerned with the problem of the determination of the moments of the 
function 

(4.1) y m = u] 

m 

where i ranges over any particular fixed set of m integers which for simplicity 
is usually taken as the first in. 

One first recalls the fact that the result is independent of which m indices have 
been selected, and that the expected value of any combination of powers is 
independent of which specific subscripts of w, are involved. 

Since the U{ are correlated, principles of combinatory analysis are involved in 
determining the moments of y m . One possible way of obtaining the moments 
is as follows: 

Let v r denote the rth moment of y m about y m — 0. Thus 

(4.2) E[( Vm y] = Vr = *[(xy,n. 

m 

Now in the expansion of w i) r i fhe sum of the power indices of each term 

m 

is 2r. Thus referring back to (3 1) and (3.3) it will be noted that the expected 
value of each such term will have the common factor 

l/[n + 2r] 2r . 

Consider a general term of the expansion of (X)«») r 

m 

Cr ira . r jf ■ n\[ l u, r l i ■ • • Ui h k , with n + r 2 + ■ ■ ■ + r k = r. 

Clearly 

‘' • <*) = 2n I 2r 2 I • • • 2r* l/[n + 2lr] 2r . 
and the coefficient C riTt ... Tll is the multinomial coefficient 

rl 

CW -r* = fi | fl | ... n \- 

Now in the expansion of (£w?) T group the terms which have the same set of 

in 

h values of r, , irrespective of which indices of arc involved. The number of 
such terms (since each involves k different indices) is (&)• If ri , r«. ■ ■ ■ , i>, 
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are all different each combination could be taken in 7c! different ways. Thus with 
r’s all different and fixed, the sum of all coefficients of terms with same combina¬ 
tion of 2r, powers (irrespective of variation of indices of the «,■) is 



This Avould then constitute the total multiplier for 

2n I 2r a ! • ■ • 2 r k \/[n + 2r] 2r 

for a given set of k r’s which are all different. 

If some of r’s are repeated, let k t , h , • • ■ , k, denote the number of repetitions 
of each different r, (la ^ 1, and 7ci + 7c 2 + ■ • ■ + 7c, = k). Then each com¬ 
bination of the k r’s corresponding to a set of k products could be taken in 

7c!/(7cii hi ■ • • fc„!) 

different ways. Hence the lemma: 

Lemma 1 . Consider all admissible sets of lc different subscripts of u< and a fixed 
set of values of r ~ n , n , ■ ■ ,r k where 

n + r* + • • • + r* = r 

such that s of these r’s are different, and the number of repetitions m the set of r’s is 
given by h h ■ • ■ k. (lc, 1, and h + h + • • • + 7c, = k). The composite 
coefficient of the terms in v, involving the factor 

2r x I 2r 2 ! ■ • • 2r k l/[n + 2r] 2r 

is given by 

m\ k 1 r\ 

lc) h\h\ • • • k,\ nIr a !-*-r*r 

Examples of computation of v r by means of the above lemma. The first order 
moment is given by 

(4.4) t>i = EQjiS) = m 21 f\n + 2],. 

m 

The second order moment is given by 

«2 — ^[(^Ri) 2 ] = C\E(u\) + CiE (lijRjO i 

m 

and determining the values of Ct from Lemma I, 

. [m41 + (“) (|j) 2!2l]/h + 4]. 

or 

(4.5) v, = [m4! + 8 / [n + 4]* = [~m + ( g) g] /(" 4 *) • 
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Again for the third order moment, 

”■ - - C,BM) + cjHuVi + c,E(uWA), 

and using Lemma 1, 

= [mOl + ( 2 ) ~ — 2141 + Q || il |L n 21212! /[n + 6], 

= [m6l + 213141 + Q 2121213 !]/[n + 6], 

or 

<«> - - [-+(»)§+C?)i]/(- 1 - )- 

Similarly writing the fourth moment in the form 

v 4 = CM) + CMu)) + CMu,) + CMu)ut) + CMtfulu]) 

and using Lemma 1 it reduces to 

(«) »,-[-+(»);+(j) s+(r) i+(:) ,-y/C t 8 ) • 

Higher order momenta of the probability distribution function may be com¬ 
puted as desired. 

An alternate method of computing the moments of the distribution of this test 
function is the following: 

Consider a function g 0 (x) such that 


(4.8) 

- Or)', MO) - 1. 

Thus 


(4.9) 

E[v? r ) = [d r UQ)/dx r }/[n + 2r] 2r 


From the principles of combinatory analysis of linear operators, it follows that 1 
(4.10) E[(L u\)*\ = /[n + 2r] 2r . 

tt% QiX x«0 

Although this is an enlightening analytical form, actual computations seem to be 
simpler with the use of Lemma 1. 

i One way of Beeing this is to first think of the u% as statistically independent. The 
numerators of the resulting terms would be the same as in (4.10). l\hcn the u, are taken 
as dependent, by virtue of (3.1) the numerators will remain the same while all denominators 
will reduce to [n + 2 r]u . 
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Moment generating function. The moment generating function of the prob¬ 
ability distribution of y m can be written as 

(4.11) E(e "0 = Go(t, m) = 1 + E [d r (g B (x)] m /dx r | _J/[n + 2r] 2r t r /r! 

1 

with 

g a (x ) = 1 + 2 ! x + 4 I x 3 /2! + 6 ! z 8 /3i + • • • + (2r) 1 x r /il + • • ■ 

[n -f 2 f\iT = (n -f 2r) (ft + 2r — 1) • • • (n + 1). 

Although Qa(x) exists only as a formal power series, G' 0 (i, m) is defined by (4.11) 
as a power series with positive coefficients, converging for all t 


5. Some comments on test function, p = 2. At the present time the study of 
the test function for p = 2 has not gone far enough to justify publication of re¬ 
sults. One difficulty is that although its asymptotic distribution function ap¬ 
pears to be normal, the convergence towards normalcy may be extremely slow 
in some cases 

Furthermore there are indications that the ease m = n + 1 will give the most 
definitive results not only because the complete range of data is used, but also 
because errors of Type II would in geneial have a less erratic effect. 

For the case m = n + 1 the mean, variance and third and fourth reduced 
moments (i.c. moments about the mean divided by corresponding power of a) 
arc: 

Case m = n + 1. 


B(y n+1 ) = 2 /(» + 2), <r J = 4 n/\(n + 2 )\n + 3)(n + 4)], 
lOn — 4 


= Ma/c 3 = 


(5.1) 


(n + 5 )(n + 6) 




' (n + 3 )(n + 4 ) 
n 


0-4 = 


+ 101ft 2 + 14w - 8 

'3 (n + 3) (ft + 4)' 

_(ft + 5)(n + 6 ){n -f- 7)(n + 8)_ 

n 


_ 6(4In* + 241n 3 + 118n a - 784ft - 48) 

ai n(n + 5)(?i + 6 )(n + 7)(ft + 8) 


If data is not grouped the test may be applied as follows: Given a function 
Q(X) which has been fitted to the cdf F{X). Front a random sample of size n 
with x, ordered as in (2.1) compute the successive differences of Q{x.) to obtain 
the variables u*. Then consider the sum of the squares 

u* = E uT. 

v-\-l 


If Q(X) is a true representation of F(X) the variation of U* will follow that of 
y n+l . Thus the expected'value of V*, its variance etc. will be independent of 
the fitted function Q(X), which represents certain advantages over the x test. 
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The effect of Type II errors can be roughly analyzed as follows: In considering 
the effect of such errors the testing procedure must be criticized from the point 
of view that 


Q(X) * FIX). 

For m — n + 1 it still is true that 

2 uf = l 

which tends to act as a control upon U*. For example set 

U( — Ui ~b Xl . 

Then from the above relation it follows that 
(5.2) 2 Xi = 0 

Write U* as 

U* = 2u“ + 2 X < + 22mx. 

fe «>\ 

= M + 2 X * + (22 x .)/(n + 1 ) + 22 Xi S(u,) 

where S(u{) denotes the variation of the true frequency differences from their 
expected value l/(n + 1). 

The variation 5(u,) will be to a considerable degree independent of x ,. Thus 
the term 2 X ? will in general tend to be larger than the last term on the right. 
The third term on the right will be zero by virtue of (5.2), and hence U* will tend 
to be larger than y„+ 1 . A similar effect upon the sampling variance of U* can 
be noted. Hence an interval of rejection 

V* ^ A, P[y n +\ ^ A] = a - confidence level, 


is pointed to. 

On the other hand if m < n + 1 the condition (5.2) no longer holds, the term 
(2 2 X< )/(n + 1) of (5.3) will not be zero and in many cases would dominate the 
other two error terms. Thus it is easily conceivable that <§ne may have in the 
case m < n + 1 

U*, < y m 

even when the discrepancies x< are large. Hence in the case m < n + 1 choice 
of confidence interval will require considerable care (see [ 1 ]) 

Although the distribution of y n +i for small n is decidedly non-normal, if the 
test function is replaced by 

(5.4) r n +i “ (2[w. — I/( w + i)! 2 )* 

it will be found that the probability density function takes on the normal charac¬ 
ter quite rapidly with increasing n. Indeed the author has found that a com¬ 
puted approximation to the probability density function of r„+i with n = 4 is 
decidedly normal in character. 



548 


BRADFORD Jf. KIMBALL 


REFERENCES 

[1] J, Neyman, “Smooth test for goodness of fit,” Skand. Aktuar. Tid&kn, (1937) p. 149. 

[2] E. S. Pearson, “The probability integral transformation for testing goodness of fit 

and oombming independent tests of significance,” Biomelnko,, Vol. 30 (1938), 
pp 134-148. 

[3] E. J. Gumbbl, “Simple tests for given hypothesis,” Biomeirika, Vol, 32 (1942), pp. 317 - 

333. 

[4] H. ScHHFPfi and J. W Tukey, “Non-parametric estimation, I. Validation of order 

statistics," Annals of Math. Stat., Vol. 16 (1945), pp. 187-192. 



AN ESSENTIALLY COMPLETE CLASS OE ADMISSIBLE DECISION 

FUNCTIONS 

By Abraham Wald 
Columbia University 

Summary. With any statistical decision procedure (function) there will be 
associated a risk function r(d ) where r(0) denotes the risk due to possible wrong 
decisions when 6 is the true parameter point If an a priori probability distribu¬ 
tion of 6 is given, a decision procedure which minimizes the expected value of 
r(6) is called the Bayes solution of the problem. The main result in this note 
may be stated as follows: Consider the class C of decision procedures consisting 
of all Bayes solutions corresponding to all possible a priori distributions of 6. 
Under some weak conditions, for any decision procedure T not in C there exists 
a decision procedure T* in C such that r*(fi) g r(8) identically in 6. Here r(0) 
is the risk function associated with T, and r*(6) is the risk function associated 
with T*. Applications of this result to the problem of testmg a hypothesis are 
made. 


X. Introduction. In some previous publications [1], [2] the author has 
considered the following general problem of statistical inference:. Let 
X — (Xi, ' • • i A„) be a set of chance variables. Suppose that the only infor¬ 
mation we have concerning the joint distribution function F of these chance 
variables is that F is an element of a given class 0 of distribution functions. 
Suppose, furthermore, that a class D of possible decisions d is given one of which 
is to be made on the basis of an observation x = (*i, • • • , x n ) on the chance 
vector X. The problem is then to construct a function d(x), called statistical 
decision function, which associates with each sample point x an element d{x) 
of D so that the decision d(x) is made when the sample point x is observed A 
statistical decision function d(x) is defined over all possible points x of the sample 
space and for each sample point x the value of the function is an element oiD. 
Each element d of D will usually be interpreted as a decision to accept the 
hypothesis that the unknown distribution F of X belongs to a certain subc ass 
a of a Different elements d of D correspond to different subclasses o> of Q 
The problem of testing the hypothesis H that the unknown distribution unc¬ 
tion F belongs to a given subclass « of Q, is contained as a special case m the 
above general problem. The space D will then contain only two elemeihs 
d x and (k , whore (k denotes the decision of accepting H and d, denotes the 

^Ashr [1] mdlS wfsshall assume also here that fl is a fc-parameter family of 

distoSlioSct ons. Then each element of fl may be represented by a point 
drskibutron functmns^ ^ . Q ^ ..dimensional Cartesian space. 

The class fl is’then represented by a subset of the /c-dimensional Cartesian space, 
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called parameter space. We shall, therefore, refer to S2 as the parameter space 
and to its elements aa parameter points. 

The merits of any particular decision function d(x) will usually depend on 
the relative importance of the various possible errors caused by not selecting 
the proper element d of D. The relative importance of such errors has been 
described in [1] and [2] by a weight function W{0, d ) defined over the product of 
SI and D. For any pair (0, d ) the value of W(0, d ) is non-negative and expresses 
the loss caused by talcing the decision d when 6 is the true parameter point. 
For any given decision function d(x) the expected value of the loss is given by 

(1.1) r(0) = f m d(x)] dF(x) 

where M denotes the sample space and F(x) is the j oint cumulative distribution of 
X = (Ah , • ■ • , Ah) corresponding to the parameter point 9. 

The function r{0) is defined over the parameter space 0 and is called the risk 
function. The shape of the risk function r(6) will, in general, be affected by the 
decision function d{x) used. To put this dependence in evidence, we shall use 
the symbol r[Q | d(t)] to denote the risk function r(9) associated with the deci¬ 
sion function d(x). 

A decision function d(x) is said to be uniformly better than the decision 
function d*(x) if 

(1.2) r[0 | d(x)) g r[0 | d*(»)] 

for all 0 and if there exists at least one point 9 for which the inequality sign holds 
in (1.2). A decision function d(x ) is said to be admissible if no other uniformly 
better decision function exists. 

A class 0 of admissible decision functions will be said to be essentially complete 
if for any decision function d(x) not in C there exists a decision function d*(x) 
in C such that 

r[0 | d*(a;)] ^ r[0 | d(a;)] 

9 

for all 6. 

In section 2 we shall formulate certain assumptions which will then be used 
in section 3 to derive an essentially complete class of admissible decision func¬ 
tions. In section 4 applications are made to the problem of testing a hypothesis. 

In a recent paper Lehmann [3] obtained an essentially complete class of 
admissible tests for each hypothesis H of a certain restricted class of simple 
hypotheses. The restrictions imposed on SI in Lehmann’s paper are essentially 
those formulated by Neyman [4], [5] to insure the existence of the type Ai 
(uniformly most powerful unbiassed) test. Our definition of an essentially com¬ 
plete class of admissible decision functions agrees with that given by Lehmann 
when the problem is to test a hypothesis and the weight function W(0, d ) can 
take only the, values 0 and 1. 
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2. Assumptions. Throughout this paper we shall make the following as¬ 
sumptions: s 

Assumption 1: The parameter space 0 is a bounded and closed subset of a 

finite dimensional, say/c-dimensional, Cartesian space. 

We shall introduce the following convergence definition in the space D: a 

sequence [«-»), (m - 1 , 2, ■ • • , a d inf,), of elements of D is said to converge 
to the element d of D if fa 


lim W(6, d m ) = W(e, d ) 


uniformly in 9. 

Assumption 18: The space D is compact and, for any d, W(0, d) is a continuous 
function of d?. 

Assumption 8: For any point 6 of Q the joint distribution function of 
X ~ (Xi, • • • i X„) admits a density function p(x, 6) for all points % of the 
71 -dimensional Cartesian space M (sample space). The density function p(x, 6) 
is assumed to be continuous in x and 6 jointly. 

In what follows we shall mean by a distribution function /( 0 ) of 6 a cumula¬ 
tive distribution function for which / dj(fi) = land for which f W{B,d)df(d) 

JO J Q 

is not zero identically in d, 

Assumption 4- For any point x of M, except perhaps for a set of measure 
zero, and for any cumulative distribution function f{6) there exists one and 
only one element of 1 D for which the expression 

(2.1) f W(0, djpfa 6) df(d ) 

takes its minimum value with respect to d. 

Assumptions 1 and 3 in this paper are exactly the same as Assumptions 1 and 3 
in [ 2 ]. The formulation of Assumptions 2 and 4 is somewhat different from 
that given in [ 2 ], This is mainly due to the fact that in [2] the space 2) has the 
same elements as fl, while here this is not necessarily so. It can be verified 
without difficulty that this slight modification of the assumptions does not 
affect in any way the validity of the results obtained in [2] Thus, we shall be 
able to make use of any theorems proved in [ 2 ] for the purposes of the present 
paper. 

3. Derivation of an essentially complete class of admissible decision func¬ 
tions. For any distribution function /(0) defined over SI and for any sample 
point x lot d(x, /) denote the element of D for v.hicli the cxpres-ion (2 1) takes 
its minimum value, It follows easily from the definition of / (0) and d(x, /; that 
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for any decision function d*(x) . If we interpret f(0) as an a prion probability 
distribution of 6, inequality (3.1) says that the expected value of r(0) takes its 
minimum value for the decision function d(x, /). We shall refer to d(x, /) as 
the Bayes' solution of the problem corresponding to the a priori probability 
distribution f(0). 

We shall now prove the following theorem. 

Theoeem 3.1. The class C of all Bayes’ solutions d(x, f) corresponding to all 
possible a priori distributions f(9) is an essentially complete class of admissible 
decision functions. 

Proof. First we show that for any distribution f(d) the decision function 
d(x,f ) is admissible. Let d(x) be a decision function such that 

r[6 | d(x)] g r[6 \ d{x, /)] 

for all Q. Then 

(3.2) f r[9 | dOr)] df(d) g f r[6 [ d(x, /)] df(9). 

in Jn 

From the definition of d(x, f) it follows that the equality sign must hold in 

(3.2) , i.e., 

(3.3) f r[0 | d(x)] dm = f r[fl | d(x, /)] d/(0). 

Jn Jn 

From the second half of Theorem 4.2 in [2] it then follows that 

r[0 | d(x)] = r[9 \ d(x, /)] 

for all 9. Hence d{x, f) is an admissible decision function. 

We shall now show that the class C of decision functions d(x, f) corresponding 
to all possible a priori distributions /(0) is essentially complete. Let d 0 (x) be 
any decision function not in the class C. The essential completeness of the 
class C is proved if we can show that there exists a distribution f{9) such that 

(3.4) r{6 | d{x, /)] ^ r[9 | d 0 (x)] 
for all 0. 

To prove (3.4) we shall consider the weight function 

(3.5) W*{6, d ) = W(9, d ) - r[9 | d 0 (:s)] + Max r[9 | d 0 (®)] 

6 

The maximum of r[6 | d D (x)] exists, since according to Theorem 4.1 in [2] r[0 [ do (a;)] 
is a continuous function of 6. Clearly, Assumptions 1-4 remain valid if we 
replace W(Q, d) by W*(8, d). Let r*[0 j d(x)] denote the risk function associated 
with the decision function d(x) if the weight function is given by W*{9, d). 
According to Theorem 5.2 in [2] there exists a decision function d*(x) such that 

(3 6) Max r*[6 I d*(x)] ^ Max r*[6 | d(x)] 

8 0 



DECISION FUNCTIONS 


553 


for any decision function d(x). Since 

Max r*[6 \ d 0 (x)] = Max r[0 I d 0 (x)l 
it follows from (3.6) that 

(3-7) Max r*[0 | d*{x )] g Max r[9 \ d 0 (x)]. 

Inequalities (3.5) and (3.7) imply 

(3-8) r[6 | d*(x)] ^ r[6 [ d 0 (x)] 

for all 0, 

For any distribution f(6) we shall denote by d*(x, /) the Bayes solution of 
the problem corresponding to the a priori distribution f(d) when the weight 
function is given by W*(0, d). Since W*(B, d) - W(fi, d ) depends only on 9 
but not on d, one can easily verify that d*(x, f) = d(x, /). It follows from 
Theorems 4 4 and 5.1 in [2] that there exists a distribution /(0), the so-called 
least favorable distribution, such that (3.6) remains valid if we replace d*(x) 
by d*(:c,/). Thus we can put 

(3.9) d*(x) = d*(xj) = d(x,f). 

Hence, from (3.8) we obtain 

r[0 | d(x,f)} ^ r[0 | d 0 (x)] 

for all 6. This completes the proof of Theorem 3.1. 

4. Applications to the problem of testing a hypothesis. In this section we 
shall apply the results of the preceding section to the problem of testing the 
hypothesis II that the true parameter point is included m a given subset u of 12. 
We shall assume that to is an open subset of 0 The space D consists now only 
of two elements, di and di , where di denotes the decision of accepting II and 
denotes the decision of rejecting H. 

We shall assume that the W(0, di) is equal to zero for points 6 in the interior 
or on the boundary of to, and positive elsewhere. Similarly, W(6, d 2 ) will be 
assumed to be positive for points 6 inside to and zero outside to. For any a priori 
distribution f(0) the Bayes solution is given by the following test: We reject 
the hypothesis II if (and only if) 1 

(4.1) ( W(6, di)p(x, 6) df(6) > f W(6, d 2 )p(x, 9) df(9). 

Jn-w 

Thus, the class C of regions (4.1), corresponding to all possible distributions 
f(6), is an essentially complete class of admissible critical regions. 

For any critical region R we shall denote the probability that the sample x 

1 Whether the equality sign is included or not in (4 1) is of no consequence, since by 
Assumption 4 the measure of the set of points x for which the equality holds in (4.1) is zero. 
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will fall in if when 9 is true by P(0 | R ). It follows from. Lemma 4.4 in [2] and 
Assumption 3 that P(6 ] if) is a continuous function of 6 for any region if. 
Since W(Q, hi) is positive m the interior of C — co, and W(9, df) is positive in co, 
the class C of regions defined in (4.1) will have the following properties: 

(a) For any region if outside the class C there exists a region if* in C such that 


and 


P{8 | if*) g P(d | if) in to 
P(6 | if*) > P(6 I if) in Q - 6). 


(b) If if and if* are members of C such that 


P(6 | if*) g P(9 ] if) in co 

and 

P(6 | if*) > P(9 | if) in U - co, 

then 

P(9 | if*) = P(B | if) for all 0. 

For any distribution g(0) consider the critical region consisting of all sample 
points x satisfying 

(4.2) [ p(x, 9) dg{9) > f p(x, 9) dg(0). 

*Q—u J w 


Let C * be the class of regions (4.2) corresponding to all possible distributions g(9). 
One can easily verify that any region in C is also a member of C*. Thus, the 
following theorem holds: 

Theorem 4.1 Suppose that Assumptions 1 and 3 are fulfilled and co is an open 
subset of 0. Suppose, furthermore, that for any distribution g{6 ) the set of sample 
points x satisfying the equation 

[ p(x, 9) dg{6) = [ p{x, 9) dg{8 ) 

Jfl—<0 *t(i) 


has the measure zero. Then, for any region if outside the class C* there mil be a 
region if* m C* such that 


and 


P{6 | if*) < P{6 | if) in co 
P(9 | if*) ^ P{6 | if) in 0 - co. 


Addition at proof reading: After this paper was sent to the printer, the author 
obtained a generalization of Theorem 3.1 to sequential decision functions, as well as 
some other results. They will appear in a forthcoming issuo of Econometnca. 
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DISCRIMINATING BETWEEN BINOMIAL DISTRIBUTIONS 

By Paul G. Hoel 
University of California at Los Angeles 

1. Summary. Given a set of It random samples, Xi, , ■ • • , , from a 

binomial distribution with parameters p and n, it is shown that the familiar 
binomial index of dispersion 

k 


Y fa. — 

1 



yields an approximate best critical region independent of p for testing the 
hypothesis n = no against the alternative hypothesis n > n 0 , provided x and 
n 0 — x are not small. Because of the nature of the test, its optimum properties 
also apply to testing whether the data came from a binomial population with 
n = no or from a Poisson population. 

2. Introduction. A problem of considerable interest in certain fields is that 
of deciding whether a set of observations should be treated as having come from 
either a binomial population or from a Poisson population. Although there was 
much discussion a few years ago concerning the best method for making such a 
decision [1], [2], [3], no solution of the problem was presented. In this paper a 
test that possesses certain optimum properties is derived for discriminating 
between two binomial populations. This test, however, is also capable of solving 
the problem of how to discriminate between a binomial and a Poisson population. 
The methods that are employed in the derivation of this test are similar to those 
of an earlier paper [4] in which the problem of discriminating between two Poisson 
populations was studied. 


3. Similar regions. Let n denote the number of trials and p the probability 
of success in a single trial for a binomial distribution Let ■ , x k repre¬ 

sent the observed frequencies in k random samples from this binomial population. 
Now consider the two alternative hypotheses 


and 


IIo : n = no, p = po 


Hi: n = n x > n 0 , p = pi. 


The purpose of this paper is to construct a test for discriminating between the two 
values of n regardless of the values of p; however it is convenient to begin with 
these more restrictive hypotheses 
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For the purpose of finding a critical region for testing Ho against Hi , the x t - 
will be treated as the coordinates of a point in h dimensions. The probability of 
obtaining the particular point x x , ■ ■ ■ , x k when H a is true will be denoted by 
Pa [*J. Since the probability of obtaining x successes m n trials is given by 

__ — _ v x o n ~ z 

xl(n — x) 1 ^ " 

it follows that 


(1) 


Pofel 


(no!)* 2 (no-^i) 

k VP So 1 

Iliil (no - aii) t 

i 


In searching for a critical region that will be independent of p 0 , it is illuminat¬ 
ing to study the methods that were designed by Neyman and Pearson [5] for 
continuous distributions. These methods suggest that one should look for criti- 

k 

cal regions on the surfaces 52 = constant. For this reason, instead of 

i 

using (1) for constructing critical regions, it is desirable to study the conditional 

k 

probability distribution of the points lying in the plane 52 = N, where N is a 

i 

positive integer not exceeding kn 0 . The conditional probability of obtaining 
the point %i , ' ,Xu, when the point is restricted to lie in the plane 52 x, — N, 

will be denoted by P 0 [xi | N]. Its value may be obtained by dividing the proba- 

k 

bility (1) by the probability that the point will lie in the plane 52 x, = N. If 
this latter probability is denoted by Po[N], then 


(2) P ok|V] 

Since the sum of k independent variables each possessing the same binomial dis¬ 
tribution has a binomial distribution with n replaced by kn, it follows that N 
possesses a binomial distribution and that 


(3) 


PAN] = 


(fcno)! 


N\(kno — N) 


N Jtn o-jf 
P 0 ffo 


If (1) and (3) are substituted in (2), it will reduce to 


(4) 


Po[xi\N] = 


(no<)*An(frn 0 - AQ' 

' k 

(fcno) I IT (no — £.)l 


This conditional probability distribution in the plane E *. = N is independent 
of vo and therefore may serve as the basis for constructing a critical region that 
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is independent of p 0 for testing Ho against Hi . It will therefore be possible to 
test the less restrictive hypothesis 

H'o : n = no 


against 


Il[ : n = ni > no. 


4. Best critical region. Although a best critical region does not exist for 
testing H'o against Hi , it is helpful to proceed as though one did. 

h 

If a critical region of size a could be selected in each plane E = N, 

1 

{N — 0,1, ■ ■ • , hio), then the totality of such critical regions would constitute 
a critical region of size a that is independent of p 0 and which therefore could be 
used to test H'o against H [. For, if P 0 [X e C.R.] denotes the probability that 
the sample point, which will be denoted by X, will lie in the critical region, it 
follows that 

kn o 

Po[X 0 C.R.] = E Po[iV]Po[X * C.R. I A] 

< 6 > -£ WM. 

tf-o 
= Oi. 

This last equality follows from the fact that the sample point must lie in one of 

k 

the planes (N = 1, * ‘ * > tobd* 

i 

Furthermore, this would be the only critical region of size a. independent of 
po, because if a critical region of size a N , (N = 0, 1, ■ ■ - , kno), were selected in 

h 

the plane E = IV(IV — 0,1, • • ■ , kno), it would be necessary that 
1 

kn q 

E Potfvw = 

AT—0 

independent of the value of p 0 . From (3) this is equivalent to requiring that 

< 6) 

independent of the value of p 0 • Since the left side bf (6) is a polynomial in po , 
its constant term must equal a and all other coefficients must vanish. It will be 
observed that no terms of the sum in (6) that arise from N > r will contribute to 
the coefficient of pi ; consequently this coefficient will not contain the unknowns 
“r+i , • • • , a* no . These considerations show that the a N must satisfy equa¬ 
tions of the form 
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a = Coo ao 
0 = Cio do + Cn ax 


0 — Cfc„ o O<*0 + C/cnpldl + •• • + Cknokno dkno . 


It will also be observed that c„ — (too) '/ r '(fcn-o — r)!; consequently the triangular 
matrix of the coefficients in these kn 0 -f- 1 non-homogeneous equations is non¬ 
singular. The equations therefore possess a unique solution, namely the known 
solution of an = a 

The preceding discussion shows that it is necessary to find critical regions of 

k 

size ol in each plane "J2 s, = (N = 0,1, • • • , kn 0 ), if a critical region indepen- 

1 

dent of po is desired. If each such planar critical legion were a best critical 
region for that plane, then the totality of such regions would constitute a 
best critical region independent of po for testing Ha against Hi 

It follows from the theory of best critical regions [5] that if a best critical region 
k 

m the plane '22 — N did exist, it would be determined by the inequality 


(7) 


PoUAN] 

Pi[xAN] 


<IC, 


where Pi corresponds to P„ when Hi is true and where K is a constant whose value 
is chosen to make the critical region one of size a Now from (4), 

Po[xi | N] _ M*(/cwo - iyiKfaO'nfrH - at)! _ 

( 8 ) Pxfx, I N] {rh\) k {kni - N) l (/ro 0 )' | n(n.o - Xi)l' 

In order to study the possibility of a best critical region, it is therefore neces¬ 
sary to study the possibility of (8) satisfying inequality (7). 


6 . Approximate best critical region. Unfortunately, because the variables 
are discrete, it is not possible to find critical regions of exactly size a for ar i rary 
a as required in (5), Consequently it is necessary to introduce continuous ap- 
proximftting functions (or discrete prob.bil.ty functions or to resort to other 
devices if critical regions of the type tossed in the preceding seeUon 

%o*Z ! turposo of introducing s»oh apprototions, (8) noil be wntten m 
the following form: 


PolxAN) ^ f (fcno - N)j (l\ kn °- N 

PJx2\W] 1 n(no - -T,) 1 W 


(ftnx-N)'/ lV“ l * 
n(ni - *.)! W 


( 9 ) 
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where c x is independent of the variables . It will be observed that the ratio 
on the right is a ratio of two multinomial functions. Now the multinomial 
function 


N\ 




X\\Xi\ ... Xk\‘ 
h 

where Y1 *< — N, can be approximated by the multivariate normal function 

(2tt 1V) 4U !) Vpi Vi- •• Ph 


( 10 ) 


The approximation is good provided the Npi are large and tho remain away 
from their extreme values, If this approximation is applied to both numerator 
and denominator of (9), to this order of approximation, 


( 11 ) 


Po [*.• | N] 
Pi fa | N] 


k U2 e exp 

-w- 

1 

l_1 

/ x, - N/k VI 

Wno - N/k) J 


[2^kno - 

■ jV)] itfc - u 


k kl2 p exp 

-fa 

1 

[ _1 

( x, - N/k \ 

’] 

ni — N/k) 


2ir(kni - 

lV)] ,u-li 



f hrii — Nf k w |“ j rii — n 0 

Cl - N] C SXP L 4 (n x - N/k)(?io - N/k) 

■ S (*. - iv/ft)*]. 


Since, by hypothesis, n t > n D and n 0 > N/k, except for the case of no = N/k, 
which will be considered later, it follows that 


(ni - N/k){no - N/k) 

h 

As a consequence, the right side of (11) will decrease in value as 2 ~ N/k) 2 

i 

increases in value. If (xj, * * • , x/) is a point lying on the sphere 

(12) £ ( x , - N/kf = R 

1 

and if the coordinates of this point satisfy inequality (7) when approximation 
(11) is used, then all points outside this sphere will also satisfy (7) to this same 
order of approximation. A best critical planar region of size a in this approxi- 

1 k 

mate sense can therefore be obtained m the plane 2 = N by determining a 
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sphere with center at (N/k, ■■ • , N/k) such that when II' is true the probability 
is a that a point lying m the plane will lie outside this sphere. Furthermore, such 
a region will be a common best critical region for all values of m > n 0 because 
the preceding arguments do not require the value of n i but merely the knowledge 
that th > n 0 . 6 

For the purpose of determining the radius of the sphere that will yield the 
desired critical region, (4) will be expressed as follows: 

(13) Pok 1 N\ = C2 AY AY^ 

rrasfi V*/ n(«,-*,)iw 

where Ci is independent of the x t . If these multinomials are replaced by their 
multivaiiate normal approximations as given by (10), to this approximation 
(13) will reduce to 


Po k | N] = c a r exp 




(14) 


= c a e exp 


2 


VW) J eexp 
E k - N/k) 


r , k ( - N/k vi 

. i Wnt-N/k)] 


k V knj J 


where c a is independent of the x { . Since E, = ]f here, x k may be expressed 

in terms of the remaining variables; consequently (14), except for a constant 
factor, may be treated as a normal distribution in the variables x k , • • • , x k -i. 
If the factorials in c a are replaced by their Stirling approximations, it will be 
found that c 3 is the correct constant for the normal distribution. 

Since it is known [6] that —2 times the exponent in a normal distribution func¬ 
tion possesses a chi-square distribution, it follows that to this order of ap¬ 
proximation 

E k - N/k)* 

(15) 


N( 

1 N \ 

k \ 

knj 


possesses a chi-square distribution with k — 1 degrees of freedom. If x« is a 
value such that P[x > x«J = a, then 


(16) 


E k - N/kf 


N ( 

1 N \ 

k \ 

knj 


= x« 


determines a sphere such that to this order of approximation the probability is 

k 

« that a point lying in the plane E = N will lie outside the sphere. From 
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the arguments following (12), it therefore follows that a common best critical 
region in this approximate sense for testing Ho against Il[ will consist of that 

k 

part of each plane 23 x < = N, (N = 0, 1, • ■ ■ , kn a ), which lies outside the cor- 

i 

responding sphere given by (16). Since the t, are non-negative and do not 
exceed n 0 , the planes corresponding to N = 0 and N — kn 0 contain a single 
point; therefore it is necessary to adopt some convention that assigns 100a per¬ 
cent of the samples with N = 0 and N = kn a to a critical region in order to obtain 
critical regions of size a in these two cases 
For a given set of data, the procedure to he followed then consists in calcu¬ 
lating the statistic 

k 


13 Or, - r) 5 



k 

where .r = 23 x,/k, and agreeing to reject the hypothesis that n = no in 

i 

favor of the alternative hypothesis that n > n 0 if and only if z > xl , where 
P[x > Xa] = a for k — l degrees of freedom Because of the nature of the 
approximations used in (10) and (14)’, this result may bo expected to be accurate 
only if .c and no — .E are large. 

The interesting feature of this lesult is that the familiar binomial index of 
dispersion, z, possesses optimum properties in this approximate sense for testing 
n = n B against n > n a . 

6. Poisson application. Since the preceding test will possess approximate 
optimum properties for n as large as desired, independent of the value of p, 
and since a Poisson distribution with parameter m can be approximated as 
closely as desired by means of a binomial distribution with np = in by allowing 
n to increase sufficiently, it follows that the test will also possess approximate 
optimum properties for deciding between a binomial distribution with n = no 
and a Poisson distribution. 

7. Estimation of n. Although the purpose ol' this paper has been accomplished 
in the preceding sections, it is interesting to observe the role played by the closely 
related Poisson index of dispersion in the cxlimation of n. 

Approximate confidence limits forn may bo obtained by means ol' (16). 
If Xi-a is a value of x 2 such that P{% > xi-„] = 1 — a, then, to this same order 
of approximation, the probability is 1 — 2a that 



563 


discriminating between BINOMIAL DISTRIBUTIONS 


If these inequalities are solved for n, the following 100(1 - 2a) percent approxi¬ 
mate confidence limits for n will be obtained. 


(17) 


2 


Xa 


xxi 

2 (%, — s) 2 


< n < 


A 


2 


Xl-K 


- 2 
flXi-» 

2(x, — x) 2 ‘ 
x 


Only the lower limit here will possess optimum properties Now it will be ob¬ 
served that only positive values of n will be admissible if 

2(*. - x-) 2 ^ , 

-5-< Xi-o) 


whereas only negative values will be admissible if 

X(X| x) ^ 2 

-T- > X« • 


The range of values will be infinite in each case if there is equality rather than 
inequality. If, however, 


2 ^ X) . 2 

Xl—n T ^ Xa , 


then both positive and negative values of n over infinite ranges will be admissible. 
Since n increases as the Poisson index 2(x>, — xf/x increases until it becomes 
infinite and then increases from minus infinity through negative values, (17) 
may Btill bo thought of as giving an interval (infinite) of values with a positive 
“lower” limit and a negative “upper” limit. Thus, the familiar Poisson index 
of dispersion plays an interesting role in determining whether a Poisson assump¬ 
tion is reasonable as far as admissible values of n are concerned. 

If the population is truly binomial, negative values of n must be ruled out; 
consequently a Poisson assumption becomes increasingly tenable as the Poisson 
index increases However, experience has shown [7] that a negative binomial 
distribution is often more realistic in describing data supposedly drawn from a 
binomial or Poisson population than is the assumed distribution; consequently 
a negative binomial should be given consideration if (17) yields only negative 
values or if it yields a negative “upper” limit that is numerically small relative 

to a positive “lower” limit. . 

It is also interesting to consider the point estimation of n. Here, it is cus¬ 
tomary [7] to estimate n by means of 

liiZb 

’ - x) 2 

k -£ ■ 

Thus, a positive, infinite, or negative estimate for n will be obtained according as 
the Poisson index is less than, equal to, or greater than k. 
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BILINEAR FORMS IN NORMALLY CORRELATED VARIABLES 

By Allen T. Ceaig 
University of Iowa 

1. Summary. If a variable x is norm filly distributed with mean zero, we have 
previously given a necessary and sufficient condition (see references at end of 
this paper) for the independence of two real symmetric quadratic forms in n 
independent values of that variable. This condition is that the product of the 
matrices of the forms should vanish. In the present paper, we have proved 
that the same algebraic condition is both necessary and sufficient for the inde¬ 
pendence of two real symmetric bilinear, or a real symmetric bilinear and 
quadratic form, in normally correlated variables. 

2. Introduction. In this paper, we determine the moment generating function 
of the joint distribution of two real symmetric bilinear forms in certain normally 
correlated variables and derive a necessary and sufficient condition for the 
independence, in the probability sense, of these forms. We further investigate 
the condition for independence, in the probability sense, of real symmetric 
bilinear and quadratic forms. 

3. The moment generating function of the distribution of real symmetric 
bilinear forms. Let the two variables x and y have a joint normal distribution 
with moans zero, unit variances and correlation coefficient p. From this bi¬ 
variate distribution, repeated random samples of n pairs, say (xi, yf), (x 2 , yf), 

, (a’« , V«), arc drawn. Let C = || c }k || be a real symmetric matrix and write 
fl = 22 c jk x/Uh ■ The moment generating function of the distribution of 8 
is then given by 

„(i) = E[e ,s ] = ( 2 jrV /J_ p ,)i [ n '•' I. e ‘° ° d V' dx ' dyidXl > 

where 

Q = -ht i 2 (*; + y) ~ 2px,y,) 

2(1 - p 2 ) j 

and 0 is defined above. If we subject the x’s and y’s to the same linear homo¬ 
geneous transformation with appropriately choBen orthogonal matrix L, then 
Q remains invariant and 0 becomes where the Vs are the n real roots of 

the characteristic equation of C, that is, of | C - XI | = 0. The integrations 
are then easily effected and we find that 

ip(t) = {IX [1 — f(p "h 1)Ay][1 — f(p — 1)A 7 ]} , 

= [\I -t{p + l)C\-\I-t{p-l)C\ } -i , 

= 1 1 — 2p<C - (1 - P 2 )i 2 C 2 r 1 , 
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where I is the unit matrix of order n and the vertical bars, as usual, indicate the 
determinant of the enclosed matrix. 

Next, let A = || a, k || and B HIM be two real symmetric matrices each of 
order n. Write 0j = 2 la^x/yt. and 0 2 = 22 bjk%f!Jk where the x’s and y’a are the 
items of the sample randomly drawn from the bivariate distribution previously 
described. The moment generating function of the joint distribution of 0 i and 0 2 
is then given by 

= ( 2 *Vr^V)-” /"’••• I" e‘ ,MMa “ Q dy n dx„ ■ ■ ■ d Vl dxi, 

where 0i, 0 2 , and Q have the meanings previously assigned to them. If we 
pursue a line of reasoning similar to that above, we find that 

vik , U) = 1 1 - 2pM + kB) - (1 - p 2 ) M + UBY | 

4. The independence of bilinear forms. It is clear that there exiBt positive 
numbers, say hi and hi , such that <p(k , k) exists for 0 < k < hi and 0 < k < hi. 
It is well known that a necessary and sufficient condition for the independence 
of 0 i and 0 2 is that <p(k , k) shall factor into the product <p(ti , 0 )<p( 0 , k). If then, 
we assume 6 y and 0 2 to be independent, we have essentially 

1 1 ~ 2 P {tiA + kB) - (1 - p a )(f,A + kB ) 2 1 

1 = 1 1 - 2 pkA - (1 - P 2 )t\A 2 1 • 1 1 - 2 P kB - (1 - p 2 )^B a |. 

If h denotes the smaller of hi and h 2 , then the factored form holds for 
0 < <i, k < h, and hence for all real values of h and k . In particular it holds 
for k = tj so that 

11 — 2pti(A + B) — (1 — p 2 )t\(A + B ) 2 1 

= 1 1 - 2 P kA - (1 - p 3 )<U* I • I / - 2 phB - (1 - p%B 2 1. 

Let ri , r 2 , and r < ri + n denote the ranks of the matrices A, B, and A + B. 
Further let the real non-zero roots of the characteristic equations of these ma¬ 
trices be denoted respectively by m , a 2 , • • ■ . a r , , ft, ft , ■ • ■ , ft t , and 71 , 7 a, 
■ • ■, 7 r . Then the members of the preceding equation may be written 

II [1 - h(p + l)y<][l “ k(p - 1)7.1 

<-l 

and 

II [1 — kip + 1 )«J [1 — kip — l)a,[ XI [1 — kip + l)ft][l —flip — l)ftl 

{ml t‘s»l 

respectively. It is seen that the left member is a polynomial in k of degree 2r 
and that the right member is a polynomial in k of degree 2(ri + r 2 ). Accord- 
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ingly, r = n + n and the roots 71 , • • • , 71 consist of the roots ^ , 

ft > ''' 1 P'* ■ That 1S > lf ancl are independent, then the rank of A + £ 
is the Bum # the ranks of A and B and the non-zero roots of the characteristic 
equation of A + B consist of those of the characteristic equation of A together 
with those of B. Further, if in (1) we put U = ufi, where v is real, we have 

1 1 - 2 ph(A + vB) - (1 - p y i( A + v B'f | 

“ I 1 ~ - (1 - P 2 )«U 2 1 ■ 1 1 - 2fthvB - (1 - p 2 )t]v 2 B 2 j. 

Denote the rank of A + vB by r' and the non-zero roots of its characteristic 
equation by 5i, * • * , 6 r >, The immediately preceding equation can then be 
written 

r' 

XI [1 — h(p + 1)5,][1 — fi(p — 1)5,] 

t’«l 


ftn — fi(p + l)ai][l — fi(p — l)a,] IX [1 — ti(p + 1 )«| 8,][1 — h(p - 


i-i 


From this we infer that, apart from zero roots, the roots of the characteristic 
equation of A + vB are on , ■ ■ • , , vfr , ■ • • , w/3 r ,. 

If a symmetric matrix, say M(v), has elements which are real polynomials 
in the real variable v, and if the determinant 

| M{v) - XIL = (“1)"[X — Pi(w)][X - pi(v)] • • • [X - p„(u)], 

where pi(v), pi(v), • ■ ■ , p n {v) are likewise real polynomials in v, then there exists, 
for all real values of v, a real orthogonal matrix, say L{v), such that 


L'(v)M(v)L(v ) 


pi(v) 0 ■ ■ ■ 0 

0 p 2 (v) 


0 


pM 


Furthermore 1 , exists for all real values of v. Since 

dv 

| a + vB - X7 | = (~I) V“ tri+r,) (X - m) • • • (X - a ri )(X - «0i) • • • (X - vp r ,), 


1 A number of years ago. in connection with another problem, the writer sought the as¬ 
sistance Of Professor N. II McCoy for a proof that L(v) is differentiable at v = 0. Pro¬ 
fessor McCoy’s elegant demonstration of the ex.stence of l{v) showed that each element 
of this orthogonal matrix is itself a real polynomial in v, divide . y e posi ive sc l u ^ r ® 
root of another real polynomial, which polynomial is never negative and which vamshes 
for no real value of V, Thus the derivative of L(v) exists not only or r - 0 but for all 
real values of 11 . The writer thanks Professor McCoy for his kind, and generous assistance. 
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then A + vB belongs to the class M (y) so wo have 

[ cti 0 


( 2 ) 


L'(v)(A + vB)L(») = 


0 a, x 
0 • • • vfii 


0 • • • w/3 ri • • • 0 


In particular, 


(3) 


L'(0)AL(0) - 


Oil • ■ • 

0 



0 


o • ■ 

0 


If we differentiate (2) with respect to v and subsequently set v = 0, we have 

|0 01 


(4) d J^M AL(0) + L'(0)J3L(0) + I/(0)A^^ 
av av 


ft 



bilinicah j-oiats 


569 


Hiuiv L{r) ,K m ‘ llu,K ° lull > Ul ™ VMW = 1- Upon differentiating both mem¬ 
ber ax ilh reaped, to r, nail subseipunitly setting v = 0, it is seen that — ® L(0) = 

-L'm -£ 1 m ilmt /,'((» 'I'M is „ skew-symmetric matrix, say S. Further 

( 5 , , -//«,) m) „ _ m - (0) , 

and, by taking conjugates, 

® TT - -««> I® i(0) - ms - 

If wo multiply (5) on the right by AL(0) and (6) on the left by L'(fl)A, we see 
that (4) may bo written 


(7) T/(0)BL(0) = 


+ SL'(0)AL(0) - L'(0)AL(0)JS. 


Since S is skew-symmetric and since L'(0)A L(0) is given by (3), then each 
clement on the principal diagonal of SL'(Q)AL(0) and L'(0)AL(0)$ is zero. 
Further, since L'(0)BL(0) is symmetric, then L'{0)BL(Q) takes the form 
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Because llie non-zero roots nl Hit*, charnc I eristic equalion of L'ID) BUD) on> fit, 

■ ■ • , fi ri then the sum of all (no-nmed principal minors of (lie (lelermiiiaul of 
L'(0)/fL(0) musL etpial the sum of llie. proiluels of ft, , fi r . taken Bvo at a 
time. That is 

i <i »<; 

so that each ha , being real, is zero. Accordingly, BL'(D)AL(0) — L'{D)AL([))R 
is a zero matrix and L'{Q)BL(S)) is given by the first term in the right member 
of (7). We then have 

L'(Q)AL(Q)L'(0)BL(0) = L'(0)ABL(Q) = 0, 

from which it follows that AB = 0. 'Finis, if the real symmetric bilinear forms 
0 i and 0 2 arc independent in the probability sense, the product of their matrices 
is zero. 

If, conversely, AB = 0 , then 

vCh , 4) - | T - 2p(U + kB) - a - p ! )(/!/i 2 + 1\W) I ~ s , 

= [ [I ~ 2 ptui - (1 - p^ilm - 2p47t - (t - p)i\W] I 
= <p{h , 0 )v>( 0 , tf), 

and 0 i and 02 aro independent. This establishes the following theorem. 

Theorem I. Lei x arul y be normally correlated with means zero, unit variances, 
and correlation coefficient p. Let Oi and 02 be two real symmetric bilinear forms m n 
random pairs of values of x and y, say (n , yf) , fe , y t ) , ■ ■ • (x n , y n ). A necessary 
and sufficient condition that 8i and 62 be independent m the probability sense, is that 
the product of the imbrices of the forms be zero. 

5. Simultaneous reduction of quadratic or bilinear forms. The argument 
of Section 4 may be used to establish in a very simple manner the following 
theorem. 

Theorem II. Let A and B be two real symmetric matrices with constant ele¬ 
ments , each matrix of order n. A necessary and sufficient condition that there exist 
a real orthogonal matrix of order ji such that simultaneously each of UAL and L'BL 
is in canonical form, wherein no non-zero elements occupy corresponding positions 
on the principal diagonals, is that AB = 0. 

Tor if such an orthogonal matrix L exists, it is evidont that L'ALL'BL = 
L'ABL — 0 from which it follows that AB = 0. Conversely, if AB = 0 , then r 
being a real scalar, the matrix (/L - hi) (vB - hi) is equal to the matrix 
—k[(A + vB) - >,/]. These matrices being equal, their determinants are 
equal so that A + vB belongs to the class M(v) of section 4 Thus L may be 
taken as 1/(0) and simultaneously L'A L and L'BL are of the form stated in the 
theorem. 

6 . Independence of bilinaer and quadratic forms. Let 0 = 22cbe a 
real symmetric bilinear form of rank r, in the previously defined variables 
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f.i'i , >h\ ■ ■ ■ , f.i'u, y n ) and let q — 226 jA aijiq lie a real symmetric quadratic form 
°i ’" in ^ ■'’*) ■ r -1 ‘ ' j 1,1 »• As usual, denote the non-zero roofs of the charac¬ 
teristic equations of A and B by a k , , ■ ■ ■ , a T1 and ft , ft, ■ • , ft 2 respectively. 

The moment generating function of the joint distribution of 6 and q is 


<p(.k,k) 


( 2 ‘ 


1 r® 

frVl — P 2 ) J-°> J-« 


dy n dx„ • • • dyi dxi , 


where, as previously, 


^ “ 2(1 - p 2) + v) ~ 2px 3 1/,). 

We. first orthogonally transform the variables so that the exponent in the inte¬ 
grand becomes, upon writing 11 11 = L'BL, 

i^ a ,xWi + - 2(I i p2) + V? 1 ~ 2p<y',). 

We. then integrate on y[, y ' 2 , • • • , y'„ and obtain for the exponent in the inte¬ 
grand 

1_ ^ 

t t 2v,k*Wh - §2*7 + pk^x' + —A.. 

Tf we effect on the variables x [, x ' t , • • ■ , ai' n the inverse of the orthogonal trans¬ 
formation initially used on the x’s and y% the exponent in the integrand becomes, 
using || gjk || - A 2 , 

UXXhjuXjXk - fix) + pi^a ik x,x L + i^g lk x 1 xi 1 


or 

— ^22[5j/ 0 — 2pUa,k — (1 — p ! )h?j*b — 2kb ]k ]xj%k , 
where. 5 jk equals 1 or 0 according as j does or does not equal h. Hence, 

(8) ?{k , k) - 1 1 - 2p<iA - (1 - pViA 2 - %B | 

If 0 and q are independent, we have 

(9) | / - 2 pkA - (1 - p')iU l - 2 kB | 

== 1 1 - 2 P M - (1 - p 2 )f?A 2 1 • 11 - 2(jB |, 

for 0 <h< h and 0 < U < h ■ As before, the member, of (9) are polynomials 
which, being equal for 0 < k , k < K are equal for all real value, of h and h . 
If we put h = 1 and U = vti = v, where v is real, then i 9J becomes 

\J-2 P A - (1 - p’)A’ - 2vB\ = \1 - 2pA - (1-pVMI-MI 

= II [1 - (P - 1) aj ][i - (p + iK]fi (1 - *&)■ 
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That is, 

| 2pA + (1 - p)^ + 2»7i - U | 

= ( —l)”X""* <ri ' lr,> [X - 2 pm ~ (1 - p)«?]---[X - 2pm, - fl - P -)a\ \ 


• [X - 2ii/3 t ] • • ■ [X — 2;iyS r J 

so that 2 p/1 + (1 — p)A 2 + 2 vJi is a matrix of the class M(v). Hence we write 

j! 2pm + (1 — p 2 )oi * ■ ■ 0 li 


L'(v)[2 P A + (1 — p 2 )y| 5 + 2vB]L(v) = 


2 pa r , + (1 — p 2 )ar, 
2v§i 



0 


The argument of section 4 shows that 77(0) [2p/l + (1 — p a )A a JL(0)/y(0)2BL(0) 
is a zero matrix, from which it follows that 2pAB + (1 — p 2 ) A‘B - 0. But 
this imposes on p, ?i 2 conditions of the form 


2 plji + (1 — p 2 )mp, = 0, O', k = 1,2, • • , n) , 

Since these hold for every — 1 < p < 1, they hold identically. Hence each 1# 
and mjk is zero. In particular, || lj k || = AB = 0 if 6 and q are independent. 
Conversely, if AB = 0, we see by Theorem II that (8) becomes 


<p(h i U) — <p(k , 0)r(0, ti), 

so that 0 and q are independent. This yields Theorem III. 

Theorem III, Let x and y be normally correlated with meant! zero, unit vari¬ 
ances, and correlation coefficient p, Let 6 be a real symmetric bilinear form m the 
n random pairs of values of x and y, swy {Xi, yf), ■ • • , (a,,, y„), and lei q be a real 
symmetric quadratic form vr, x x , , • • ■ , x n (or Vi, • ■ ■ , y„). A necessary and 

sufficient condition that 0 and q be independent in the probability seme is that the 
product of the matrices of (he forms be zero. 

For example, let 9 be n times the sample covariance and let q be n times the 
square of the mean of llio x’s. Then 

d = 2(x, - x)(yj - y) 


= 22a ]k z/y k ; 
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where 


a ik 



if 3 = k, 


and 


= — - otherwise, 
n ’ 


q — n£ 3 — ZhbjkXjXk , bj k = 1/n for j, k = 1, 2, • • ■ , n. 

bin.ce AB 0, then 0 and q are independent, a fact otherwise known hut perhaps 
not so easily established. 
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ON THE CHARLIER TYPE B SERIES 

By S. Ktilliiacic 
George Washington University 

1. Introduction. The Type B series of Charlier has been discussed in some 
delail in the literature (See references at the end of the paper). The problem 
of the convergence of the Type B series has been considered by Pollaczek- 
Gciringcr [12], [13], Szegd [12] (page 110), Uspensky [10], Jacob [5], Schmidt [16] 
and Obrechkoff [11]. There is presented m the following a method of develop¬ 
ment of the Type B scries which is believed to be of some interest, including a 
necessary and sufficient condition for the convergence which is basically the 
same as that of Schmidt [10]. A result of Steffensen [17] is extended and shown 
to be related to the Charlier Type B series. 


2. Statement of results. Consider the function jfir), defined for r - 0, 1, 2, 
• • , and such that 


( 2 . 1 ) SpOO-1; E 1 v(r) I = 

' r-0 

where A is some finite value. Let the u-th factorial moment be defined by 
Mo) = 1 

(2.2) - Z r(r - l)(r - 2) • • • (r - n + l)p(r), (n = 1, 2, • • •) 


MOO 




For arbitrary A let 


Un = MOO 


n(n - 1) \) 

— nm„~D A H-a 


(2.3) 


2' 

- n(n - l)(n - 2) + ... + 

tJ 1 


We prove the following results: _ . 

Theorem. A necessary and sufficient condition that the function p\r) of (2.1) 
may be expressed by the absolutely convergent series 


(2.4) 


p(r) = ~jj~ + Li¬ 


sa r\ 


U S 2 <f * A r , _ _ > 
+ 2! 3A 2 rl + ' 


is that 

(2.5) 1 + | M<1) | + h I POO I + h ! MW I + • ■ ( + ^! I Mw \ + 


converges where L n is defined as in (2,3), 
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3. Generating functions. For the function vM n f to i\ •, „ 

erating function defined by ^ ^ con8l der the gen- 

«>(*) = E z>(r) 

r—0 

whore 2 i. a complex variable. Because of (2.1) it is dear thli the light membe . 
of (3.1) is uniformly and absolutely convergent for I z I < 1 o n A>, n+ + e r 
of convergence of (3.1) is some value li, > 1 1 1 ~ 1 S ° that the radms 

The Taylor expansion of <p(z) about the point z = 1 i s given by 

(3.2) <p(z) = v (i) + (z - iy(i) + + ... 

where, as may be readily obtained from (3.1), 

(3.3) V> U) (1) -£r(r- l)(r - 2)-• • (r - n + l)p(r) = 

If it is assumed that (2.5) converges, then 

(3.4) „( 2 ) - l + (a-l)K„+^ r h\„,+... + tziI\ ( . 1 + ... 
is uniformly and absolutely convergent for J z — 1 | g 1, 

4. Sufficiency. For arbitrary X let us set 

a + Ma)(2 _ d + m <£jz_V + ..\ 

(41) V 21 ' 

= 1 + Li(z - 1) +b( z - l)’+ ... 

where the right member, because of (3.4) is absolutely convergent for 
| z — 1 | g 1, The coefficients on thejight side of (4,1) are given by 

(4.2) Ln — H(n ) — WlKn-l) X + ^ -- M<»-2) X 2 — ••• + (— 1)" X" 


1 

■ 2) A 


+ (-l)"X n 


and the factorial moments may also be expressed by 


L n + nL n-i X T ^ 2 f~" Ln-?. X 2 + ■ • + X". 


These, relations are readily derived by expressing (4.1) symbolically as 

(4.4) + *(«-» _ 

where after expansion /i" and L n are to be replaced by m;»o and L„ respectively. 
(Cf. Jordan [7], p. 39). From (4.1) and (3.4) there is now derived 

(4.5) v {z) = (l + hi(z - 1) + §-[ (2 - l) 2 + •' ‘• 



K. KUMiBAdK 


Since the right member of (4.5) is absolutely and uniformly convergent for 
| z — 1 ] £1 for arbitrary X, it may be expressed as 


» w -( i +/ 4 +si !.+•■•) 


Since the radius of convergence of the right member of (4.6) is some value Hz 
such that | 2 — 1 | < Rt > 1 , it may be expressed as a power series about z = 0 , or 


(4.7) ,M-(l + U± + %t+- 


M ]H 


f ^ + X 2f +••'), 


Recalling now the definition of <p(z) as given in (3.1), there is obtained by equat¬ 
ing coefficients of like powers of z in (3.1) and (4.7) 


A , T B In d 1 \e V 

~V + x ax + 2! a)T« + VTi" 


Since it may be readily shown that 


r \ n X \»* „ X r 

L ? \ _ ( _if A" -_^ 

3X n r! K ' r 1 


where 


a = e ' x 3. r _ 

r\ r! (r — 1 )! 


..i<-_X r _ .■n_i e x X r _ .n-ie x X r 1 

r! rl (r— 1)1 


vc may also write (4.8) as 


e X r T . 6 * X r , Li . % & \ r Li\ . 3 e ^ X 


(4.10) pW-t^ 


_ f±A 3 - — 4- 
31 r! ^ 


6 . Necessity. Assume that the function p(r) of (2 1 ), for arbitrary X, is 
given by the absolutely convergent series 


= ( x+L 4 + 5Nb + ■■■)' 


Since c~ x X r /rl is continuous with respect to X, there follows, where z is a complex 
variable and | z | g 1 


IVpfr) = 

r-Q r«.0 V \ 


r ~X vf A °0 .r —x x r r n2 « r —X -* r 

Ll * + Li-V,- -- + 4. 

^ I * ^ ^ — I ■ o l 1 ^ f ! 


1 5X fio rl ~ 21 <3X 2 rl 


(5 2) = (l + Li(z - 1) + ^ (z - l) 2 4-.. ■ 

= 1 + M,(z - 1 ) + Ml ( z _ D 2 + |f (* - 1 ) 8 ’ + 
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where 

(5.3) M n = L n + nL n ^ X + w(n ~ 1} L„_ 2 X 2 + ■ + X" 

From (5.2) it follows that 

(5.4) M n = ji (B) 

where g (n ) is as defined in (3.3). Since (5 1) becomes for r = 0, X = 0 

(5.5) 1 — M(l) + gif 1 ® — + ' 

the assumed absolute convergence implies that 

11 1 

(5,fi) i + | nm I 4- 2\ I M(2) I + §1! I + 1 + 1 m<»> 1 + • • • 


converges. 

6. Remarks. Obreohkoff [11] shows that his result includes those of Pollaczek- 
Geiringcr [12], SzegS [12] (p. 110) and Jacob [5]. His theorem states that if 
the function p(r), (?’ = 0, 1, 2, ■), satisfies the following conditions 


( 0 . 1 ) 


Z 2 r r A | 'p(r) 


is convergent for each finite number A , and 

(4 X -)- " s -j Z 1 - ?J -— (e~ x X r / r 0~ 


(it + l)t£l' r 

tends toward zero as n increases indefinitely then p(r) may be expressed in a 
convergent Oliarlier Typo B series. 

Uspensky [18] shows that if 

(0.3) S/PW 

has a radius of convergence R > 2 then p(r) may be expressed m a convergent 

C1 ^ck^dM10]'shmvsthat a necessary and sufficient condition for the convergence 
is that the function <p(z) defined as in (3.1) (he does not <1 
condition (2.1) on p(r)) be regular inside the two circles | z < a * d I ^ ^ 

and with all its derivatives is continuous on the peripheiic a - 
that p(r) 0, the condition (2.5) is stronger, in tact in this ease Schmidt [Ur, 

shows that a necessary and sufficient condition is that 

lim p(r)2 r r l = 0 
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for all integral k S 0, If p(r) S 0, then Uapensky’s condition is only just 
enough stronger than Schmidt’s to keep it from being sufficient. 

If (0.1) is satisfied, or if (0.3) is satisfied then (3.1) is absolutely convergent 
for | z | i 2. Therefore, the point z = 2 is contained in the circle of convergence 
of (3,2) or (3.4) which implies that 

1 + i Hw I + 2 ! I W 11 I + ‘ ' + n ’j I **t»» I + • * • 

converges, 

It is deemed worthy of special mention to point out, as both Schmidt and 
Uspensky have done, the striking fact that the necessary and sufficient condition 
for the validity of (2.4) is independent of X. This arbitrariness of X enables us 
to dispose of it so as to obtain better convergence. Indeed if we set X = 
then as is evident from (4.2) L x = 0. 


7. Special cases. It is of interest to note that (4.8) is the Taylor expansion 
if p(r) = /V/rl, (v = 0,1, 2, • • • ), for then (4.2) becomes 

(7.1) U = Gi - X) n 


since for the Poisson Exponential Distribution e V7r!, O' = 0, 1, 2, ■■■), 
H c) = pi n and (4.8) is then 


(7.2) 


rl 


rl 


+ (p — X) 


d e~ x X r 
d\ rl 


+ 


( n - X) 8 £ e ~ x X f 
21 9X J rl -r 


If p(r) is finite, that is if p(r) = 0 for r > n + 1 then /»(*> = 0 for fc > n + 1. 
Thus, for a finite function the condition (2.5) is satisfied. 


8. Factorial moments. For functions p(r), (r = 0,1, 2, • • ■), satisfying (2.5), 
there may be derived from (3,1) and (3.4) the relation 

(8.1) r<p(r) = /i (r) — p (r+ i) + ^Mcr+n - ^-|P(r+a> + > (r = 0, 1, 2, ■ ■ •)> 

since each side is v (r) (0) derived respectively from (3.1) and (3.4). It should 
be noted that for X = 0 (4 5) leads to (8.1) rather than (4.8) so that (8.1) may 
be considered as the Charlier Type B series for X = 0. The result (8.1) was 
derived for finite functions by Steffensen [17]. (Also compare Kaplansky [8]). 
This may also be expressed symbolically by 

(8.2) p(r) = ixe^/r 1, (r ® 0, 1, 2, • • ■). 

where after expansion g" is to be replaced by M< n ) . It is of interest to note the 
relation between the symbolic expression for p(r) as a Poisson Exponential in 

(8.2) and the series (4 8), for (4.8) may be expressed symbolically as 

(g 3) V(r) = • e ^f = «f (m) (X + LY/ri 

= fi e - '* / r 1 

since e aWMl) /(x) = /(x + a) and the relations (4.2), (4.3), (4.4). 
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9. Illustrations. Consider the function 
(9.1) p(r) = l/2 r+1 , 

For this function 


(9.2) 


00 

<p(z) = Ss'pfr) = 1/(2 - z ) 


and 

(9.3) *><">(1) = M(n) = n i 


(r = o, l, 2, ■ ■ ) 


so that (2.5) becomes 

(9.4) 1 + 1 + 1 + ... 

which docs not converge. (It may be of interest to note that for this case 
(8.1) yields 

(9.5) p(0) = 1-14-1 — 1 + 1 — -... 


The series on the right in (9.5) is not convergent but is summable Ci to For 
the latter see for example It P. Agnew, [19].) In this case the first several co¬ 
efficients of (4.8) are for 1 = 1, 

Ll - 0, \ = .5000, 

( 9 . 0 ). 

= .3607, = = .3681, 

51 01 

Let us now consider the function 

(9.7) f>( 0) = 4, v(r) = H r , (r = 1, 2, • • ). 

For this function 

( 9 . 8 ) <p(z) = Ez r p(r) = i + 

r»oO 5 

and 

(9.9) ^U) = MW = £(!■} (n = 1, 2, •■ •), 


g? = .3333, = 3750 

^ = .3679, 


so that (2.6) becomes 
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which conv< rge For this case (8,1) yields 


(9.11) 


' - 1 - (1)5 + 


2 / 2 2 


p(l) 


2/5 " 21 


— + 
2J 2 a 

3\ 1 


2/2* + 2lU)2 8 ■" * 


etc. 

In this case, the first several coefficients of (4.8) are for X = 0.75 


L, = 0, ^ = .093750, I? = .046875, = .019043 

(9.12) 

~ = .010840, ^ = .005173, = .002622, 

o' 0! 71 

Let us now consider the function (suggested by Prof. C. Wexler) 

(9.13) p(0) = l ) p(r) = (- l) r l (,. = 1,2,...). 


For tins function 

(9.14) fp(r) = 1, E I p(r) | = 5 

r«0 r»0 

(9.15) ?(z) = Ez>(r) = 5/(3 + 2b) 

roaO 

(9.16) p (n> (l) = moo = (- l) T ‘nl (2/5)". 
In this case (2.5) becomes 

(9.17) 


1 + = + i + z + 


which converges and (8.1) yields 


p(o)-i + ^ + m + 


(9.18) 


+ 


= 5/3 


p(l) 


-2/5 - 2!(2/6) 2 - |[(2/5)’ - 


5 

3 


2 

3 


etc. 

Note that for this case (6.1) or (6.3) are not satisfied. Using X = 1, it is 
found that 


(9 19) 


u 

41 


Li = -14, 


27 ■ 106 ’ St - -• 6906 ’ 


.2779, 
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NOTES 


This section is devoted io brief research expository articles on ‘methodology 
and other short item, 


ON SMALL-SAMPLE ESTIMATION 

By Geohge W. Bhown 
Iowa State College 

1. Summary. This paper discusses some of the concepts underlying small 
sample estimation and reexamines, in particular, the current notions on “un¬ 
biased” estimation. Alternatives to the usual unbiased property are examined 
with respect to invariance under simultaneous one-to-one transformation of 
parameter and estimate; one of these alternatives, closely related to the maxi¬ 
mum likelihood method, seems to be new. The property of being unbiased in 
the likelihood sense is essentially equivalent to the statement that the estimate 
is a maximum likelihood estimate based on some distribution derived by inte¬ 
gration from the original sampling distribution, by virtue of a “hereditary” 
property of maximum likelihood estimation. 

An exposition of maximum likelihood estimation is given in terms of optimum 
pairwise selection with equaf weights, providing a type of rationale for small 
sample estimation by maximum likelihood. 

2. Introduction. In large sample theory of estimation the problems arc 
generally formulated in terms of a random variable x = (*i, Xs, • • , x n ) and a 
product distribution with, say, a density g{x\0) = /(a;i|0)/(m 2 |0) • ■ ■ J(x n \0) 
where n is permitted to increase without limit. For small sample theory it is 
sufficient to consider an arbitrary distribution, not necessarily of product form, 
depending on a parameter 8. Tor convenience we will assume a distribution 
density of fixed form g(a;|fl), where x is in Euclidean n-space and 9 in Euclidean 
fc-space, k < n. Granting at the outset that a complete rationale for estimation 
must be based op considerations like those of Wald [4,1939] dealing with specified 
risk functions, it is still a difficult process, in practice, to specify the risk functions 
and solve the ensuing mathematics problems. It may still bo to the point, then, 
to consider general properties that estimates might be required to have in order 
to be considered “acceptable”, or perhaps even “optimum”, over a class of 
“acceptable” estimates, 

In large-sample theory the situation is fairly simple. Consistent estimates 
have the properly that the estimate converges in probability to the true param¬ 
eter value. “Best” or “optimum” estimates are defined in terms of the order 
of convergence, or asymptotic, variance. All reasonable definitions of “optimum” 
become asymptotically equivalent, since they all measure essentially the rate of 
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convergence, so that one might ask for leafC-„ 1 x , , , 

deviation, or least expected fcth power, without affecting the optfmm estate 
m general. Moreover the consistency property and the opLum proZS 
arc m genera invariant under simultaneous one-to-one transformation of th 
parameter and its estimate, i.e., the square of an asymptotically optimum esti 
mate of a will be an asymptotically optimum estimate of , 2 . Finally a general 
estimation method, the method of maximum likelihood, leads to optimum esti¬ 
mates in large samples. ¥ 


In small samples, on the other hand, the search for corresponding criteria has 
led to the investigation of best “unbiased” estimates, and the like, where few, 
if any, of the definitions discussed possess an invariance property under simul¬ 
taneous one-to-one transformation of the parameter and its estimate. 


3. Unbiased estimation. To ensure, in small-sample estimation, that an 
estimate beats some relation to the parameter it is estimating, it has become the 
custom to require that an estimate be unbiased, which means that the expected 
value of the estimate agrees with the parameter value. This condition was sug¬ 
gested by the consistency property which is required in large-sample estimation. 
It ensures, moreover, that the average of a large number of independent estimates 
made on the same basis will provide a consistent estimate, in the large sample 
sense. While this consistency property of the average may at times be conveni¬ 
ent in practical situations, the fact remains that the problem of estimation from 
a numlxsr of such observations is a different estimation problem, the “best” 
solution to which need not be the average of the “best” solutions of the original 
problem corresponding to estimation of 9 from a single observation on x, where 
x lias a density g{x\B). More to the point, however, is the objection that an 
unbiased estimate of a parameter does not in general transform into an unbiased 
estimate when both estimate and parameter are subjected to the same one-to-one 
transformation. Moreover, one can easily construct situations for which the 
only acceptable unbiased estimates are clearly inferior from almost any point 
of view, to estimates which are biased (GirBhick, Mosteller and Savage, [ 1 , 1946 ], 
and Halmos [2, 1940J). 

It may be of interest to consider a few reasonable alternatives to the lack of 
bias requirement, which seem to accomplish as much as the conventional defini¬ 
tion and which, in addition, have an invariance under one-to-one transformation 
of the parameter and estimate. To avoid confusion, let us attach the qualifying 
profix “moan” to the usual unbiased property, so that an estimate will be said 
to be mean-unbiased if its expected value agrees with the parameter value. 

Consider as one alternative the following property An estimate of a one¬ 
dimensional parameter 6 will be said to be med/i cui-iuiki a s ed , if for fixed 6, the 
median of the distribution of the estimate is at the value 8, i.e., the estimate 
underestimates just as often as it overestimates. This requirement seems for 
most purposes to accomplish as much as the mean-unbiased requirement and 
has the additional property that it is invariant under one-to-one transformation, 
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A different alternative requirement which is invariant under transformations 
is suggested by the definition of unbiased tests of significance (Neyman and 
Pearson [3,1936]). Let us say that an estimate is likelihood-unbiased if h(6 \ 9') < 
h(0 | 9), where the estimate 9 has probability density h(9 1 0). In other words, an 
estimation method is likelihood-unbiased if estimates in the neighborhood of a 
given parameter value 0 would occur more frequently when the true value is 
itself 0 than when it differs from 0. On intuitive grounds this seems to be an 
acceptable kind of requirement, applicable to a very general class of estimation 
problems. It is evident that the assumption of a density plays no important 
role here; the situation is analogous to the maximum likelihood situation. The 
property itself is invariant under simultaneous one-to-one transformations of 
parameter and estimate for the same reason that maximum likelihood estimates 
are invariant under such transformations, in fact one can readily see that the 
likelihood-unbiased condition is equivalent to requiring that 0 have such a 
distribution, as a function of 9, that the maximum likelihood estimate of 0 
based on 9 will be actually equal to 9. The obvioUB implication of this fact is 
that if a function <j>(x) is given (possibly a sufficient statistic for 9) then there is 
an essentially unique likelihood-unbiased estimate 9 based on 4>, obtained by 
finding the maximum likolihood estimate of 6 in the distribution of as a function 
of 0, 

As an example, consider the estimation of <r 2 from a sample of n observations 
from a normal distribution. Let <S J bo the usual sum of squares, where $“/a 
is distributed like x’ on n — 1 degrees of freedom. Thon the only likelihood— 
unbiased estimate of a based on S 3 is S 3 /{n — 1). In this case S 3 /(n — 1) is 
also mean-unbiased, a fact which is normally quoted as justification for the 
division by n — 1. Curiously enough, it is customary to estimate <r by 
\/)S' 2 /(n — 1)> even though this is a biased estimate of a, according to the usual 
notion of "unbiased’’, referred to here as “mean-unbiased”. On the other hand, 
-\/'S 2 /{n — 1) is a perfectly good likelihood-unbiased estimate of <r, by virtue 
of the invariance under transformations. It might be pomted out, in passing, 
that the estimate S 2 /(n — 1) does not have minimum mean square about </, 
but that the optimum divisor for minimizing the mean square error about a 1 
is n + 1. 

The fact that a likelihood-unbiased estimate is the maximum likelihood esti¬ 
mate based on the distribution of the estimate itself suggest further examination 
of maximum likelihood estimates. If we define a simple estimate as one which 
completely determines a probability distribution for x, then we have as a theorem, 
the following: 

A simple maximum likelihood estimate 9{x) is likelihood-unbiased , What this 
means is essentially that maximum-likelihood is “hereditary”, i.e. if 9(x) maxi¬ 
mizes g(x | 0) in a space of n dimensions, and 9 has a derived density h(9 | 0 ) 
in a space of k < n dimensions, then 0 = 9 maximizes h{9 | 9). The proof follows 
readily from the fact that h(9 | 9) is obtained by integration of g{x \ 9) over all 
a: such that 9{x) = 9. , 
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lhe example of estimating a, quoted above, showB that the word “simple” 
cannot be omitted from the statement above. For example, the ample estimate 
in the paient distribution is the joint estimate (x, tf/n) of (m, a 2 ) and in fact the 
joint estimate is likelihood-unbiased. On the other hand, S 2 /n is not a simple 
maximum likelihood estimate, and we observe that S 2 /n is not likelihood-un¬ 
biased. 8 /(n - 1) is a simple maximum likelihood estimate of a based on 
the distribution of S 2 itself, so that 57(n - 1) is, as a result, likelihood unbiased. 

One can exhibit situations in which the conventional mean-unbiased property 
is very unnatural, while the likelihood-unbiased property may be quite natural. 
Consider, for example, the case where a is to be estimated by use of a x-dis- 
tributed S' 2 with n - 1 degrees of freedom, but subject to the condition <r 2 > „l , 
where al is known in advance. Then the estimate a 2 - max [S 2 /(n - 1), <rjj] is 
certainly biased according to conventional definitions, but is nevertheless, likeli¬ 
hood unbiased. To get a mean-unbiased estimate when a is near to al is im¬ 
possible except by admitting estimates less than a Q , which is clearly foolish if it is 
known that a 2 > <rl . 

It may be of interest to include a brief discussion of maximum likelihood esti¬ 
mation in terms of pairwise selection of alternatives, providing a sort of optimum 
property for maximum likelihood estimation in small samples, in addition to the 
likelihood-unbiased property. Consider a choice to be made between only two 
alternative values of 8, say 6<, and Bi , by dividing the sample space into two 
regions So and Si, such that 6 0 is accepted when x falls in So and 6, is accepted 
when x falls in Si . Then 

Pi,(So) + P 8o (S,) = P ei {So) + P # ,(Si) = 1. 


Pi i($o) is the probability of making the error of accepting 9„ when 8 — 8 L and 
1 — Pi 0 (So) is the probability of making the error of accepting B r when 6 = 0o • 
If the two errors are weighted equally, it is evident that a “best” test will choose 
So so as to minimize P 9l (<S 0 ) + 1 - P 8o (S 0 ). It is well known that S 0 will 
minimize the indicated quantity if So consists of all points x such that g{x \ do ) > 
g{x | Oi). Thus we may speak of the region So defined by g(x \ do) > g{x | 0i) 
as an optimum equal risk acceptance region for do against Bi. Now if we transfer 
our attention to the general estimation problem we see that the maximum 
likelihood estimate &(x) is that value of 8 which would be accepted by the op¬ 
timum equal risk acceptance procedure against all other B’s. 
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A NOTE ON REGRESSION ANALYSIS 

By Abraham Wald 
Columbia University 

1. Introduction. In regression analysis a set of variables y, Xi, ■ ■ ■ , x T 
is considered where y is called the dependent variable and Xl , ■ • ■ , x p are the 
independent variables. Let y a denote the ath observation on y and x ia the 
ath observation on *,•, (i = 1, • • • , p; a - 1, ■ • • , N). The observations x ta 
are treated as given constants, while the observations y %, • ■ • , y N are regarded 
as chance variables. The following two assumptions are usually made concern¬ 
ing the joint distribution of the variates yi, ■ ■ ■ , y N : 

(a) The variates j/i, • ■ ■ , y>r are normally and independently distributed with 
a common unknown variance a . 

(b) The expected value of y a is equal to 01*1 a + • • • + 0pS pa where 0i, ■ • , 
0,, are unknown constants. 

In some problems it seems reasonable to assume that the regression coefficients 
0i, ■ ■ • , 0„ are not constants, but chance valuables. This leads to a different 
probability model for regression analysis and the object of this note is to discuss 
certain aspects of this model. In what follows in this note wo shall make the 
following assumptions concerning the joint distribution of the chance variables 

Vi j * * i v 0i j * * ■ j 0p» 

Assumption 1. For given values of 0i, • • • , 0, the joint conditional prob¬ 
ability density function of yi, ■ ■ • , Vn is given by 

1 r 1 N "1 

O ’l) ( 27 r) w ^ I ~2a l ^ 1 

Assumption 2. The regression coefficients 0i, • • • , 0 P are independently 
distributed. 

Assumption 3. The regression coefficients 0i, ■ • • , /3 r , {r < p), are normally 
distributed with zero means and a common variance <r' 2 . 

<r' a 

The purpose of this note is to derive confidence limits for the ratio —. Such 

v 2 

confidence limits have been derive'd by the author [1] for analysis of variance 
problems assuming that there arc only main effects but no interactions. The 
regression problem treated in tho present noto is much more general and in¬ 
cludes all the analysis of variance problems with or without interactions as 
special cases, 

It should be remarked that Assumptions 2 and 3 do not exclude the case where 
0e+i, ■ • • , 0j» are constants. 


2. Derivation of confidence limits for the ratio ’L. Let bi b p be the 
sample estimates of 0j, • ■ • , 0„ obtained by the method of least squares. We 
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shall denote the difference ft - ft by , (i = 1, • ■ ■ , v ). It 1S known that for 
given values of ft, ■ ■ , ft the conditional joint distribution of is 

normal with zero means and variance-covariance matrix |j c t , IJ a where 

(2.1) II «<> II = I! II" 1 

and 

(2.2) u„ = E W,*, (i, j = 1, ■ • ■, V). 

deal 

Since the conditional distribution of e 1 , • ■ ■ , e T does not depend on the values 
of ft , • ■ ■ , ft, > the unconditional distribution of *i ,•••,«„ is the same as the 
conditional one, and the set of variates (ft , • • , ft,) is independently dis¬ 
tributed of the set (e t , • • , t T ). From this and Assumptions 2 and 3 it follows 
that In , ■ • • , h r have a joint normal distribution and that 


(2.3) 

Ebi - 0 , 

(»= 1, ■ 1 

■ ,r) 

and 

/ J*\ , 



(2.4) 

Eb t b, = u, + 

(». 3 = 1. • 

■, r) 


where ft, = 0 for i ^ j and = 1 for i — j■ 

f2 

We shall denote — by X and the elements of the inverse of || c„ + A,A || by 
<r 2 

chi(X), i.o., 

(2.6) II || = || c>/ + ftA II 1 > (h 3 = 1) • 11 > r )- 

Tlion the quadratic form 

I r r 

(2.6) Q(X) = - 2 E E d„(\)b,bj 


has the X distribution with r degrees of freedom. 

It is know'll that for any given values of ft , • • • , ft,, ft , ■ ■ • , ft, the quadratic 


form 

1 AT . j 

(2.7) Qa - 12 ( V a - frlZlB - • • • - Kxj,a) 

has the x a distribution with N - p degrees of freedom provided that the rank 
of tho matrix || || is p. Hence Q a and Q(X) are independently distributed 

and the ratio 

n N — p OCX) 

to at V — -- ~~n 


has the F-distributiori with r and N — p degrees of freedom. 
Let Ft and F 2 be two values chosen so that 

(2,0) Prob. (Fi Ss F ^ F2I = 0 
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where c is a given positive constant less than 1. Then the set of all values X 
for which the inequality 

(2.10) I<\ < QM < f,\ 

r 

holds forms a confidence set for X with tire confidence coefficient c. 

We shall now show that Q(\) is a monotonic function of X and, therefore, 


the confidence set determined by (2.10) is an interval. 

Let |1 || , (i, j — 

1, ■ • * i r), 

be an orthogonal matrix and lot 

(2.11) 

b* = 2 

i- 1 


It then follows from (2.3) and (2.4) that 


(2.12) 

E(b!) - o, 

(i = 1, ■ • • , r) 

and 



(2.13) 

m*b*) = (4 + S,vX)<r\ 

(i, j = 1, • ■ • , r) 

where 



(2.14) 

* r r 

Cii ~ X/ ^ 0ik On Oki . 

i-i fc-i 


Let 



(2.15) 

II 4(X) II = II c*j + 5ij\ II" 1 , 

(i,j ~ 1, , r ) 

and put 

Q*(X) = 4 s2 4(X)b?5/. 

- O' 



It is easy to verify that Q*(X) is identically equal to Q(X). Hence, to prove 
the monotonicity of Q(X), it is sufficient to show that Q*(X) is a monotonic func¬ 
tion of X. Since no restrictions as to the choice of the orthogonal matrix || || 

are made, we shall choose it so that the matrix || c <# || becomes diagonal, i.e., 
c* = 0 for t 7 ^ j, (i,j = 1 , • • • , r). Then 

(2.16) d*/(X) =0 for * j 

and 

(2.17) 4(X) = TXT. 

Hence 

(2.18) Q(X) ='Q*(X) = - 2 Z t&t 

<«i Cii + X 

is a monotonically decreasing function of X, The confidence set determined by 
(2.10) is, therefore, an interval. 
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The upper end point of the confidence interval is the root in X of the equation 


(2.19) 


N ~ V Q(\) = 

r Q a 


and the lower end point is the root in X of the equation 


( 2 . 20 ) 


N - v Q(X) 

~ q7 = Fl 


If equation (2.20) has no root, the lower end point of the confidence interval 
is put equal to zero. 
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ON THE SHAPE OF THE ANGULAR CASE OF CAUCHY’S 
DISTRIBUTION CURVES 

By Aurel Wintner 


The Johns Hopkins University 

1. Let £ be a linear random variable, that is, a random variable capable of 
values x represented by points of a line — ® <x < and suppose, for sim¬ 
plicity, that £ has a density of probability, f(x). Then, subject to provisos of 
convergence, the series 

F(x) = 2 /(* + u ) 

Tiro—oc 

represents a periodic function, of period 1, having the following significance: 
F(x) is the density of probability of the angular random variable, say E, which 
is obtained if all the states 

£- 2 , £- 1 , £, £ + 1 , £ + 2 , ■ • • 


of tho linear random variable are identified. 

In other words, if a circle of unit circumference rolls from - « to “> on the 
{-Une, then every point of the circumference collects the various densities o 
probability attached to congruent points of the £-line, and a state of m rep - 
sents a point of the circumference. For a detailed study of the mapping £ - - 

01 According to’ Poisson’s summation formula, the Fourier constants of the 
periodic function Fix) can be obtained by restricting u in g(u) to an ^distant 
sequence of discrete values, where ,(«) denotes the Fourier transform of /(*), 
cf., e.g., [5], p. 78 or [9], pp. 477-478. 
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2. Consider, in particular, the case in which /(x) is the density of a symmetric 
distribution which is stable in Cauchy’s sense. The determination of the totality 
of these linear densities of probability is due to L5vy [6], It was shown in [8] 
that every such/(a:) = f(—x) is a decreasing function of | x |. As explained in 
[8], p. 70, this fact makes superfluous one of the axioms occurring in Gauss’ 
postulational approach to “errors of observation.” 

The purpose of the present note is the deduction of the angular analogue of 
the fact just quoted. The analogue states that, if /(x) is symmetric and stable, 
then the corresponding periodic F(x) is decreasing for 0 ^ J (and so, for 
reasons of symmetry, is increasing for § g x 5 1). This is contained in the 
italicized statement of §4 below. 

In view of Poisson’s rule, quoted above, the periodic densities in question can 
be defined by certain Fourier series representing generalizations of elliptic theta- 
series. From this point of view, not even the existence (i.o., the 'positivity) of 
the periodic densities is obvious, if arbitrary values of the "precision constant” 
(denoted below by q) are allowed. The difficulties involved are explained in §3. 

3. If q and X arc positive constants the first of which is less than 1, then the 
(even, periodic.) function 

eo 

(1) 9\(x ; q) => 1 -j- 2 2 (f* cos nx, 

where </ nX > 0, has derivatives of arbitrarily high order at every real x. It is 
regular-analytic at every real x if and only if X > 0 is replaced by X £ 1, where 
the sign of equality holds if and only if the analytic continuation (from the x-axis) 
is not an entire function. In fact, it is known that a Fourier series 
2(a n cos nx + b n sin nx) is that of a function which is regular-analytic at every 
real x, and has the period 2ir, if and only if | a n | + | b n | is majorized by a con¬ 
stant multiple of the nth power of a positive constant which is less than 1; 
and that the latter constant can be chosen arbitrarily small if and only if the 
analytic continuation docs not lead to any singularity (at a z ^ m). 

Since the function (1) tends to 1 uniformly in x as q —» +0, if X is fixed, there 
belongs to every X > 0 a positive q* = q*{\) having the property that 

(2) h{x ; q) > 0 for 0 ^ x < 2 tt 

if 0 < q < q*(X ). It is loss obvious that, if q is sufficiently small with reference 
to X, say if 0 < q < < 2 **(X), then 

(3) 0*(x; q ) is decreasing for 0 g z ^ it 

(hence, increasing for w x < 2ir). The existence of such a g**(X) < » for 

every X > 0 can be assured as follows: 

If s„(x) denotes the nth partial sum of the Fourier series 2(sin nx)/n, then 
s„(x) is positive for 0 < x < t (Gronwall, Jackson; for a short proof, cf. [4]). 
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Hence, a partial summation shows that the sum of a sine series, 26 n sin nx, 
must be positive for 0 < x < it if 

nb„ — (n -f- l)ii„ + i > 0 and nb n — > 0. 

Since the first derivative of (1) (with respect to a) results by choosing 
5„ = ~-2nq n , it follows that (3) must be true if 

nV x - (n + l)V n+1)X > 0 

holds for n — 1, 2, • • • . But the last inequality is readily seen to be satisfied 
from n — 1 onward if, while X is fixed, q tends to 0. This proves that g**(\) 
exists for every X > 0, 


4 , From these deductions alone, it is quite unexpected that (the best values of) 
both g*(X) and g**(X) turn out to be independent of X when 

(4) 0 < X ^ 2, 

i.e., that (1) satisfies both (2) and (3) for 0 < q < 1, if (4) is assumed. This 
fact is of statistical significance, since, on the one hand, it is precisely the restric¬ 
tion (4) which is necessary and sufficient for the existence of Cauchy’s (sym¬ 
metric) "stable” distributions (cf [6], pp 254-263) and, on the other hand, 
the reduction (mod 2ir) of the densities of these linear distributions leads to 
the functions (1) as angular densities (cf. [9], pp. 477-478); the numerical value 
of q{ < 1) being determined by the "precision” or "dispersion” of the resulting 

angular distributions. * 

Under the necessary restriction (4), the linear analogue of g (X) = 1 and of 
= 1 was proved in [0], pp- 258-263 and in [8], pp. 71-77, respectively. 
It will remain undecided whether the restriction (4) is necessary in either of 
the angular cases. 


6 . Suppose that X has a fixed value in the range (4). Then there exists a 
monotone function of t, say ai\(t), for which 

exp (—w x ) = f ex P ( — ut) doi},(f) 


is an identity in «, where 0 < » < « (* HI. P- 769 where farther references 
will bo found). Hence, a change of variables shows that 


g nX = jf day(t | log q I 1 2X ) 

is an identity in q and n, where 0 < q < 1 and n = 0,1, 2. 
variable is t). Consequen tly from (1), 


a. f-r -fi) = f 6i(x ; g‘) da\(i | log 3 I 1 S/X )» 


(the integration 
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where 0 < q < 1 and — <x> < ar < <». l n fact, the legitimacy of the term-by- 
term. integration is obvious from 0 < q < 1 and dct\ g 0 (even though the inte¬ 
grals are improper). 

6. Since « x is a non-decreasing function, it is clear from the last formula line 
that both (2) and (3) will be proved for 0 < q < 1 and for every X (satisfying (4)), 
if it is ascertained that both (2) and (3) hold for 0 < q < 1 when X — 2. But 
the case X = 2 of (1) is an elliptic theta-function, for which both properties in 
question (cf. the diagram in [3], p. 44) are known; a simple proof can be con¬ 
cluded from what, in Heckc’s terminology, is the Eulerian factorization of 
Oi(x ; q), as follows: 

According to Jacobi, the factorization of the case X = 2 of (1) is 
fli(* iff) - ft (1 - S 5n )(l + 2q"~ l cos x + r/*- 2 ) 

n«al 

(cf. [7], pp. (14-05). Thus 

e,(x ; ?) = c, n P(x + ir ; g 2 "' 1 ), 

Hnl 

where 

<•. - n a - <? n ) 

»-i 

•and 

(5) P(x ;r) = 1 — 2r cos x + r 2 , (0 < r < 1), 

hence 

P(x ;r) > 0 (0 < r < 1). 

Since 0 < q < 1, this proves the case X = 2 of (2). Furthermore, logarithmic 
differentiation of the product representation of 6 2 (x ; q) gives 

<(* ; q) = 0 t (x ; q) £ P'(x + X ; q in ~ l )/P(x + tt ; g 2 "" 1 ), 

T»-al 

where/' = df/dx ; so that, by (6), 

P'(x + 7r; r) = —2r sin x. 

Since 0 < q < 1, the last three formula linBS and the case X = 2 of (2) imply that 

0 2 (x ;q) <0if0<x<7r, 

as claimed by the case X = 2 of (3). 

This completes the proof of the italicized assertion. 
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A NOTE ON THE FUNDAMENTAL IDENTITY OF SEQUENTIAL ANALYSIS 

By G. E. Albebt 

U. S, Naval Ordnance Plant, Indianapolis 

1. Introduction. Let W, (♦ = 1, 2, 3, • • ■), be a sequence of real valued 
random variables identically distributed according to the cumulative distribution 

function F(z). Define the sums Z K = * + ft + • • •+ * P° Bltlve 

integer N Choose two positive constants a and b and define the random van- 
cS ns the smallest integer N for which one of the inequalities Zw S a or 
2 g - b holds. The notations P(« | F) and E(u 1 F) will denote the probabnty 
of u and its expectation respectively assuming that F is the distribution of the z„ 
Wald [1] has established the results contained m the following lemmas. 

Lemma 1. If the variance of F(t) is positive P{n < ”| P) eqmh one. ^ 
Lemma 2. If there exists a positive number 5 such thatP{e _ \F) > 

nnd Pie 2 > 1 + a I F)> 0 and if the moment generating function v {t) - E{e \ ) 
aU rml J He, d t, then M ha, me and «• — “ ** 

°'Cl 


( 1 ) 


= 1 


(1) to be valid to. 1 T iknik 

—< •«<« 

on a certain interval of the real axis. 
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is valid and may be, differentiated with respect to t under the expectation sign any 
number of times for all real valuer of t. 

Pnoov. The notation k will be used consistently to denote the l value at which 
i/>(t) has its minimum. 

The, proof of the theorem follows Wald’s methods quite closely and certain 
of the results given in [1] and [2] will be used here without discussion. 

Consider first the validity of (1). For an arbitrary positive integer iV let I\ 
be the probability P(n g jV | F) and let E N (u \ F ) and l'f N {u | F) denote the 
conditional expect ations of u subject to the respective conditions n -S. N and 
n > N, Wald [1] has shown that for any finite real value of t, 

( 2 ) p N E tr [c^mr n i p\ + a - i fi = 1 . 

Since lim PnE n { [</>{t)]~ n exp(Z n t )} is the left member of the identity (1), it suffices 

A' 

to demonstrate that 

(3) lim (1 - PM)V N EUe ZNt | F) - 0 
for all real values of t. 

Since 1 — P s tends to zero with increasing N and the expected value E* N 
involved in (3) Is hounded independently of N for any fixed t, the only source of 
difficulty in proving (3) lies in the fact that <p(t) may be less than unity on an 
interval of the real axis. That difficulty Is easily avoided by the following 
device. Define the function 

(4) G(x) = Mo)F I f e'“dF(z). 

J— DO 

Obviously G(x) is a distribution function whose moment generating function 
\p(t) exists for all real t. Its mean is zero and its variance is positive as will be 
seen from the equations E{x | (?) = {tf) /<?{U) and E(x | (?) = <p"((o)/v(£o). It 
follows that i P(t) is never less than unity for real values of t. 

Let Q denote the space of all z x , • • ■ , z N and let 0(n > N) be that subset of (2 
on which n > N. One has 

(1 - {e* Nt I E) 

f e ZN ‘ dF(z,)---dF(zff [ e Zw( ‘-‘ o) dG(zf)■ ■ ■ dG(z N ) 

•Ig(,n>N) __^ ■ / n(n>tt) __ 

f e* Nl dF{zi)■ ■ ‘dFM f e ZNU ~ to) dQ{z x )■ ■ • dG(z N ) 

Jn J (i 

= (l-Q ff )[m-’ f E*Ae ZN ’\Cr} 

where s = t — 4 and Qn = P(n g N | G ). By Lemma 1, 1 — Q„ tends to 
zero as N is increased. Thus, since \p(s) Si 1 for all real £ and the expected value 
E N {e ZN ' | (?} is bounded independently of N for a fixed t, the equation (3) holds 
for all real t 
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Tlio differentiability clause of the theorem requires the following modification 
of a very powerful theorem due to Charles Stein [3], 

Lemma 3. Under the conditions of Lemma 2, if the minimum «,(/„) of v (t) is 
less than unity, there exists a positive number h such that 


(5) -E} exp [nti - n log *.(/„)] | F] < «. 

PnooE. If 0 is the distribution of the *,, by Stein’s theorem there exists a 
positive number h such that E(e M ' \ (?) is finite. Let 0(n = N) denote the 
subset of 12 on which n = N. Then 


P(n - N\G) = f dG(ed---d0fa) 

= [<?&)]'" [ J tatr dFfa)---dFfa r ) 

JR (»-AT) 

S P{n = iV|P) exp [min{a2o, - bi o} - N log p(2 0 )]. 

It follows that 

2?(cxp [nfi - n log *>(/„)] | F) S E{e" h | (?) exp [- min(afo , - bio}] 
and the lemma is proved. 

To continue with the theorem, Wald’s proof [2] suffices for the case m which 
¥>(to) s£ 1- Attention will be given only to the case </>(to) < 1. As pointed out in 
section 2 of [2], the differentiability clause of the theorem will be established if 
it can be shown that for any finite interval I of the real axis and any pair of 
integers rj and r 3 there exists a function D nri {Z„ , n ) such that for all t in I 


one lias 


(6) 

D TlH (Z n ,n ) 2: |n r '2 , y*'W()r 

and 


(7) 

E{D Tl r,{Z n ,ri) |F) < oo. 


On referring to Wald’s proof and using the inequality -log <p(t) ^ -log <p(k) for 
all t in I, it is seen that there exists a constant G and a positive number 4 such 
that the function 

D rin (Z n , n) ^ Cn>«.)rV"' ! + »*”') 

satisfies (0) for all t in I. To establish (7) use the inequalities (2.4) and (2.6) 
in Walcl [2] to obtain 

E[D nrt (Z n , n) | F) 

- C £ Pin - N\F)mk)r*E»-» + «' Znli I ^ 

jy-i 

^ C{e ah l(t 2 ) + e~ ih l(-k)}E{ exp 1 hlogn - nlogp(( 0 )] \F}. 


(8) 
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That (7) is indeed satisfied now follows from (5) and the finiteness of the function 
l(t) since for a large enough integer M one has 

22 P(n = N | F) exp [n log N — N log <p(fo)] 

N—tl 

g 22 Pin = A T \F) exp [Nl t - N log p(/ n )] < «. 

Thus the expected value on the extreme right in (8) is finite,. This completes 
the proof of the theorem. 


REFERENCES 

11] A. Wald, "On cumulative sums of random variables,'* Annals of Math. Slat., Vol. 15 
(1944), PI j. 283-285. 

[2] A. Wald, "Differentiation under the expectation sign in the fundamental identity of 

sequential analysis,” Annals of Math. Slat ., Vol. 17 (1040), pp. 493-490. 

[3] CiiAni.Es Stein, "A note on cumulative Bunm," Annals of Math. Stal , Vol. 17 (1946), 

pp. 498-499. 


A SIGNIFICANCE TEST AND ESTIMATION IN THE CASE OF 
EXPONENTIAL REGRESSION 

By I). S. Villabs 1 

United. Slates Rubber Company , Passaic, N. J. 


1. Introduction. The principal problem under consideration in this note 
may be described as follows. Consider a variate, z, whose distribution for a 
given value of a fixed variate, t, is: 


(U) 


/0,| ° -ivs' 


--(x—a+6*“* *) 2 12a 2 


where a, h, and k are real-valued parameters. The regression of z on t is exponen¬ 
tial, for it follows from (1.1) that the expected value of z, given l, is: 

(1.2) ‘ E(z 1 1) <= a — be~ kl . 


On the basis of a random sample 0/v(zi, h ; Zs, U ,; • ■ • ; z N , In) it is desired to 
test whether 7c = 0 or ». The problom of "fitting” a curve, z = a — be~ kt , 
to the sample (i. e, of estimating a, b, and k from the sample) will also be treated. 
As an illustration of how the statistical problems described above arise in 


1 Present address, Jersey City Junior College, Jersey City, N. J 
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practice, let us consider a typical situation in industrial chemistry Let the 
quantity, z, he a property of a latex and let the quantity, t, be time Suppo« 
furthermore, that measurement of t is without error but that measurement of z 
JS subject to error; let it be assumed that the observed value in a measurement of 
z is a variate having a normal (Gaussian) distribution about the “true value ” 
Hz). On bams of N independent measurements, *, z 2 , •. , Zjv 0 f « at times 
ti, k , • ■ ■ , In , respectively, the experimenter may wish to test the hypothesis 
that k = 0 or «. If this hypothesis is true the suspected exponential relation 
between z and t does not hold; in this case E(z) i s a constant (a - b, or a) and 
estimation of the constant from the data is quite straightforward. If the data 
conflict with the hypothesis that k = 0 or », the experimenter may wish to 
estimate the parameters, a, b, and k (i. e., “fit” the curve, z = a - be~ kl , to the 
data). 

T ho problems considered in this note will be treated only for the case where IV 
is an even integer (> 6) and the times h , fc , • • , t K at which measurements of 
z are made are such that 


(1.3) tic — h a -\ = A, a constant, (a = 1, 2, • •. , n = IV/2). 
The odd time intervals, k — k , h — k , etc do not have to be equal. 


2. Test of the hypothesis that k = 0 or ®. The space, say G, of admissible 
values of the parameters In (1.1) is: o - 3 > 0, - » < a, b, k < + ®. Under the 
null hypothesis the admissible values of the parameters lie in a subspace of G, 
say co, specified as follows: a > 0, — «> < a, b < + w, 7c = 0, or oo 

Let y, = Zac and Xj = z 2 «_i, (a = 1, • • • , n = IV/2). Prom (1.1) and (1.3) 
it follows that the n pairs Xj , yj are normally and independently distributed with 
common variance, cr 3 , that Xj and yj are independent (j = 1 , 2, ■ , n), and 
that 

(2.1) vj = h + muj 

where vj = E{yj), nj = E{x } ), h = a{ 1 - and m = The space, 

G', of admissible values of the parameters in the joint distribution of x,, y 3 , 
(j — 1, • • • , n), is: cr a > 0, vj = h + mu, , — «> < h < + °°, — °° 

+ <x>; 0 <| m < oo. The subspace of O', say co', associated with the null hypoth¬ 
esis is: cr a > 0, vj = jUy = c, where c = a — b or a according as k = 0 or «> 
In G', the expected values of x and y lie on a line, in «' they lie in a single point. 
It is clear that by transforming the original sample Oa/zi , k , ■ • , z N , t N ) to a 
sample 0„(x t , yi ; • • • ; x„, y n ) we have reduced the original problem to the 
familiar problem of linear regression in which there is “error in both variates’. 

The slope of the “line of best fit” to the sample points (xi, yi , , x„ , y„) 

is [1]: 


(2.2) 


* = [flUr - &. + Vos* - s ») 2 + 



598 


D. s, villars 


where 

Sxx =s X) (*, “ £) s 

i 

s** s 2 (■*> - x)(Vi - v) 

1 

M 

$yy ^ ^ (?/; ?/) 

1 

1 

n 

# 35 2 ?//A 
1 

(m, ia an estimate of m in (2.1)). Since m = e“ tA (where k and A arc real), r it is 
intuitively clear that when m is non-positive the sample 0„ does not coniliet 
with the null hypothesis. The null hypothesis can be tested by means of_the 
statistic [2, 144] 

fnn\ Kf/ kirx -f- 2mS xv T TH ^'Sy y 

[ > mh$„ - 2mS xy + S w • 

The null hypothesis is rejected if m is positive and F' is large. Percentage points 
of the distribution of F' arc given in [2,140] forn = 3 (1) 15 (5) 30, 40, GO, 120 
and for significance levels, 0.001, .01, .05, .10, and .20. These significance 
levels, however, were computed for use in cases where the sign of ?n w r as irrele¬ 
vant, It happens that to test the null hypothesis under consideration in this 
problem at a significance level a we should use a critical value of F' (given in 
[2]) corresponding to a significance level 2 a. The reason for this is that when 
the null hypothesis is true the quantities m and F' are independent and the 
probability that m is positive is thus the chance of rejecting the null hypoth¬ 
esis is |(2a) = a. 

3. Estimation of a, b, and k. If the data do not support the hypothesis that 
h = 0 or , the experimenter may wish to estimate a, b, and k. General alter¬ 
native methods of estimating these parameters will now be considered. 

(1) Estimate a, b, and k from 0. v by the method of least squares; i.c., solve 
the simultaneous equations dS/da = 0, dS/db — 0, and dS/dk = 0 for a, b, 
and k, where 

(3.1) S = 2 (b< - o + bc- k,i )\ 

The value of k obtained by this method of estimation will not in general be the 
same as that computable from in in (2.2) and used for the significance testing. 

(2) Estimate k by means of (2.2) and the relation m = e~ k ^, then substitute 
this estimate into S of (3.1) and estimate a and b by means of least squares. 
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( 3) Estimate k as in (2) and choose, as an estimate of a, the intercept of the 
“lmc of best ht > for 0. . Then substitute these estimates of a and k into (3 1) 
and estimate b by means of least squares. In this case the estimate of b come 
out to be: 


( 3 . 2 ) h = £ c-*“(d - *)/£ e -*. 

where & and k are the estimates of a and k. 

If the values, k , it, • • • , t# are such that Z, +1 — t t = A, (t = 1, 2, • • • ,N — 1), 

the following estimation procedure might be used. 

(4) Let 


Vi = 2 i+i 

(*' - 1, 2, , N - I), 

Xj = z< 

and treat the (N - 1) pairs of values (*i, y t , • • • ; , 2/*-i) as a sample of 

size (IV — 1). Using this sample, estimate k, a, and b in a manner similar to 
that in (2) or (3). It should be noted that this sample is not a random sample 
owing to the dependence among the (N — 1) elements. 

The procedure in alternative (1) is very laborious and time-consuming. The 
procedure m (2) and (3) can be carried out quickly and easily. In (1) the 
method of least squares yields the same results as would be obtained from appli¬ 
cation of the method of maximum likelihood. Examples of estimation by proce¬ 
dures (3) and (4) are given in the next section. 


4. Example. The accompanying table lists experimentally observed values 
of a property of a latex obtained at biweekly intervals. Using the first, third, 
etc., quantities as Xj and the remaining ones as yj , the sums of squares and prod¬ 
ucts of deviations are found to be: 

8 m = .035610 x = 0 9195 


8 m = .025645 

S yv = .023414 y = -9365. 


Substituting these values in equation (2,2) and computing the other constants 
from equation (2.1) wo get: m = 0.791596, a = 1.0009, and k = 0.1168. The 
F' ratio is (2.3) 17.03. Entering Table I of [2], we find that for eight point pairs 
a value of F' = 1G.5 may be expected only one time in one hundred. On ex¬ 
cluding the possibility of negative values of m, this corresponds to the 0 5% 
significance level. The exponential relationship is thus concluded to bo highly 
significant. 

Evaluation of b by equation (3.2), method 3, gives 0.2560, if all 16 values are 
used. The equation calculated from the data is thus: 


(4.1) 


z = 1.0009 - 0.2560 e 


- 0.11681 
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The alternative procedure, method 4, would ho to use all the 2 , points for the 
estimation of a and k. This leads to the following values of the computation 
quantities: 

IB 

S„ = E “ *i» = 0.052374; £ = 0.9223 

i-l 

IB 

S n = E « .030924 

{•ml 

10 

Sm = E - x\ = .035430; § - .9381. 

i-1 

Note that the difference used in the formula for m cancels out all inter¬ 

vening squares between the first and last. 

Stf! $xx “ fl.16- 


TABLE I 


1 

weeks 


1 

weeks 




t 

weeks 

Si 

1 

.776 

9 


17 

.942 

25 

.955 

3 

.852 

11 


19 

.938 

27 

.993 

6 

,850 

13 


21 

.979 

29 

.985 

7 

.869 

15 

.948 

23 

.975 

31 

1.013 


However, the data excluded thereby are in effect included in the new Sxy. 

The final values obtained by the fourth procedure are: m = 0.796596, a = 
1.0000, and k — 0.1137. The writer does not know whether the peculiar trans¬ 
ference of data from S m — S„ to S n characteristic of procedure 4 improves the 
accuracy of the fit or hurts it. It is his personal preference to use procedure 3. 

6. Acknowledgement. The writer wishes to acknowledge with thanks hiB 
gratitude to Drs. T. W. Anderson, Jr. and David F. Votaw, Jr. for many sug¬ 
gestions and discussions concerning this problem and for much help in clarifying 
the presentation of the concepts. 
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ON THE POWER EFFICIENCY OF A t-TEST FORMED 
BY PAIRING SAMPLE VALUES 


By John E Walsh 


Princeton University 

1 . Introduction. Consider two equal sized samples, one from a normal popu¬ 
lation With mean M and the other from a normal population with mean „ Let 
• Tl ’ •■'>*» he the sample values from the population with mean p and v, ... y 
the values from the population with mean v. If the two populations have the 
same variance and the two samples are independent, the most powerful tests 
for comparing p and v using these samples (one-sided and symmetrical two- 
sided) arc based on the statistic 


Z 2 = ~ $ ~ (/« ~ »')]'V / w(ri. - 1) 

a /Z (%i - %)* + Z (y, - y ) 2 
r 1 1 

which has a Student /-distribution with In - 2 degrees of freedom. Tests based 
on li also have.the desirable property of being invariant under permutation of 
the data in each sample 

Sometimes, however, it is useful to combine the sample values in the form 

Zi = (*» - y % ), (i = 1, ■ • • , n). 

Examples: 

(a) . When the samples are independent but it is not known that the two popu¬ 
lations have the same variance (Behrens-Fisher problem). 

(b) . When there may be correlation between x, and y x , (i = 1, ■ • ■ , n), 
this correlation being the same for each value of i (i.e. x, is independent of yj 
if i 7 * j while each pair *»,!/<,(» = 1, • • • , n), has the same normal bivariate 
distribution). 

In both (a) and (b) the Z; are independently normally distributed with the 
same variance and mean p — v. 

The Student /-test for comparing p and v using the z, is based on the statistic 
t _ [z — (p — y)lV«(n - 1) __ [x — y - (p - v)Wn(n - 1) 

^Z («< - «) 2 Z [*, - y t - (x - {?)]* 

which has a Student /-distribution with n — 1 degrees of freedom. These tests 
are not invariant under permutation of the data m each sample, 

If it is true that all the sample values arc independently distributed with the 
same variance cr 2 , efficiency will be lost by using the test based on 4 instead of 
the most powerful test based on 4 • The purpose of this note is to determine the 
power efficiency of the tests based on 4 as compared with the corresponding 
tests based on 4 for this case. 



TABLE I 


Power Function Values for the h and h Tests 


Test 

n 

Approx 

Kfficiency 

« 

Approx. Values of Power Function 

5 i2S ^ 

5 ssa^l 

i *=< ij 

5 = 2 

tl 

(i 

87% 

.05 

.276 

.074 

.033 

.994 

ta 

5.2 


.05 

.275 

. 072 

.932 

.994 

ti 

0 

82.5% 

.025 

.159 

.480 

• .822 

.970 

ti 

MB 


.025 

.100 

mm 

■SI 


tl 

8 

90% 

.05 

.355 

.812 

.985 


u 

7.2 


.05 

.354 

.813 

.985 


tl 

8 

80.5% 

mm 

.220 

.074 

.952 

.998 

ta 

0.9 


mm 

.225 

.075 

.951 

.998 

ti 

8 

82% 

.01 

.112 

1 

.813 


ta 

0.55 


.01 

.112 


.842 


ti 

■■ 

92% 

,05 

.425 

.898 

.997 


ta 



.05 

.425 

.897 

.997 


ti 

10 

93% 

.025 


.802 

.988 


ta 

9 


.(<25 

291) 

.8' 3 

.988 


ti 

10 

85.5% 

.01 

.159 

.020 

.950 

.999 

ta 

8.65 


.01 

.159 

.027 

.950 

.999 

tl 

15 

95.5% 

.05 

.579 

wm 



t2 

14.3 


.05 

.579 

■9 



ti 


93% 

.025 

.437 

.950 

1.000 


ta 



.025 

.437 

.949 

1.000 


ti 


90% 

.01 

.278 


.998 


ta 

IkE 


.01 

.278 

■ 

.998 


tl 

25 

98% 

.05 

.784 

,999 



t2 

24.5 


.05 

.784 

.999 



ti 

25 

96% 


.670 

.998 



t2 

24 



.670 

.998 



tl 

25 

94.5% 

.01 

.514 

.992 



t2 

23.7 


.01 

.514 

.992 
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Consideration is limited to one-sided tests, which is not a serious limitation 
since any two-sided test can be considered as a combination of two one-sided 
tests. Table II contains approximate power efficiencies of one-sided tests for 
n > 4 at the significance levels a = .05, .025, .01. 

It is found that the efficiency of the 4 test increases with the sample size but 
is high even for small size samples. 

2. Outline of computations. The method of obtaining power efficiencies 
used hero will be that outlined in [1]. Essentially this consists in computing the 
power function for the test based on 4 and then adjusting the sample size for 
the corresponding test based on 4 until its power function is approximately the 
same as for the 4 test. The ratio of the sample size (perhaps fractional) of the 
adjusted 4 test to that of the 4 test is called the power efficiency of the 4 test. 
Intuitively this efficiency measures the fraction of the total available information 
which is being used when the 4 test is applied (since the 4 test is most powerful) _ 


TABLE II 


Approximate Power Efficiencies for Given n and a 


\ n 
a 

4 

5 

6 

7 

8 

9 

10 l 

15 

25 

03 

.05 

.025 

.01 

82.5% 
77 %* 
73% 

,85% 

80%* 

75.5% 

87% 

82.5% 

78% 

88.5% 

84.5% 

80% 

90% 

86.5% 

82% 

91% 

88.5% 

84% 

92% 

90% 

85.5% 

95.5% 

93% 

90% 

98% 

96% 

94.5.% 

100% 

100% 

100% 


* These values were obtained by comparison with the corresponding values for 


cl 5=1 *05 find .01. 


It is easily seen from symmetry that a one-sided 4 test of /i < r has the same 
power efficiency as the corresponding one-sided 4 test of m > ". Thus it 

sufficient to consider the one-sided tests of p > v - 

The power function is found as a function of the parameter 5, where 


5 = 


r V2' 


Most of the approximate power efficiencies were determined by using the 

normal approximation given in [2] to compute the th results 

approximation was used for fractional values of «. Table i contains 
of those computations for one-sided tests of M > v- q{ n a = 0 5, ,oi 

Exact values of the power func ion^ ^ of the p0WCT function values 

can be found from the tables m .3]. - 1' ^ exact va i ues shows that, 
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tion of power efficiencies, so that little error in power efficiencies would be 
expected if the approximation were used for » = 0, a — .01 or n = 4, a ~ .05, 
the efficiencies given in Table II for ii = 4, a = .05 and n = 4, 6, a = .01 were 
obtained from (he exact values by graphical interpolation and cross-interpolation. 

Power efficiencies wore not considered for n < 4 because, of the. difficulties 
of interpolation and the inexactness of the normal approximation in this range. 

Porn = «>, U and 4 both have a normal distribution with zero mean and unit 
variance. Thus the, power efficiency is 100% at all significance levels for 
this ease. 

These computations furnish approximate power efficiencies for n = (}, 8, 10, 
15, 25, co at a = .05, .025, .01, and for n — 4 at a = .05 and .01. The, other 
approximate power efficiencies listed in Table II were obtained by graphical 
interpolation from these values, 

The results of this note can be roughly summarized for n < 15 by stating 
that of the 2u sample values 

(i) . approximately 1.6 values are lost at the 5% significance, level, 

(ii) . approximately 2.1 values arc lost at the 2.5% significance level, 

(iii) . approximately 2.8 values are lost at the 1% significance level, if the 
tests based on 4 are used instead of the corresponding tests baaed on 4 . Exami¬ 
nation of Tablo I shows that the number of sample values lost decreases as n 
increases for n > 15. 
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NOTE ON THE LIAPOUNOFF INEQUALITY FOR ABSOLUTE MOMENTS 

By Maurice H. Belz 


The University of M elboume 


For a variate x measured from the mean of the population, the absolute 
moment of order r is defined by 


?r 


[ | x | r clP(x ), 


where F(x) is the cumulative distribution function. Treating r as continuous, 
we have 


^ | as | r log-1 as | dF(x), 

the integral on the right existing if Vr+i exists. 
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Write y = log, v r Then we have 
dy 


civ r 

Vr d/r~ I x I ' 1029 dF ^> 

= U*\ TdF W' l_jxnogi\x\dF(x) -{£ i*ri flg .[*i«ff(, ) y 


s= 0, by Schwarz’s inequality. 



It follows that the function y is convex (or exceptionally a straight line), and, 
on referring to the figure, it appears that 

(1) MQ A MQ' 

for all chords PR. If the abscissae of the points L, M, N are c, b, a, respectively, 
where c ^ b ^ a, the inequality (1) leads at once to the relation 

i - a — b , , b — c , 

log, v b § -log, v, H-log, v a . 

0/ C Cl C 


Hence 

a—a ^ a—b b—c 
Vb ^ V a V a , 

which is the usual form of the Liapounoff Inequality 

REMARK ON THE NOTE "A GENERALIZATION OF 
WARING’S FORMULA” 

By T. N. E. Greville 

U. S. Public Health Service 

Before submitting for publication the note “A generalization of Waring’s 
formula,” Annals of Math. Stat , Yol. 15 (1944), pp. 218-219 ihe author made a 
diligent effort to ascertain, through correspondence with mathematicians and 
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actuaries both in this country and abroad, whether the generalized formula in 
question had been previously published, and none of the authorities communi¬ 
cated with knew of its prior publication. However, it has now come to his 
attention that the formula was published in essentially the same form by Hermite 
in the article “Sur la formulo d’interpolation de Lagrange”, Journal fur die 
Reine und Angewandle Malhcmatik (“Crclle’s Journal”), Vol. 84 (1878), 
pp. 70-79. 
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1. Estimation of Parameters in Truncated Pearson Frequency Distributions. 
A. C. Cohen, University of Georgia. 

Given a truncated univariate Pearson frequency distribution, parameters of the com¬ 
plete distribution are required. Karl Pearson and Alice Lee, ( Biomelrika, Yol 6 (1915), 
pp. 69-09) and R. A Fisher, (Introduction to Mathematical Tables, Vol. 1, British Assn. 
Adv. Sei., 1931, pp. xxvi-xxxv), obtained solutions of the truncated normal distribution 
with a single tail missing. The present paper presents three general methods of solution 
applicable to any of the Pearson distributions. The first utilizes moments of a higher order 
than are required to characterize corresponding complete distnbutions. The order of 
the highest moment required is increased by one for each missing tail. The second method, 
applicable when only a single tail is missing, utilizes the terminal ordinate at the point of 
truncation and moments of the same order as required to characterize the complete dis¬ 
tribution The terminal ordinate is evaluated by successive approximations. The third 
method utilizes only tho first two moments, but requires that the given distribution be 
further truncated and that moments be computed both before and after the additional 
truncations. This latter method can also be applied to complete distnbutions to avoid 
direct computation of third and fourth order moments. 


2. Distribution of a Root of Determinantal Equation. D. N. Nanda, University 
of North Carolina. 

The joint distribution of the roots of a determinantal equation was given by P. L Hsu 
in 1930 and tho distribution of any one of the roots was studied by S. N. Roy The present 
paper, however, gives a different method of working out the distribution of any root, 
specified by its place in a monotomc arrangement. This method enables us to express tbe 
distribution of a root of a certain determinantal equation in terms of a linear combination 
of products Of incomplete beta integrals and in terms of tbe distribution of a root of lower- 
order determinantal equations. 


3. The Power of Certain Non-Parametric Tests of Independence. Wassily 
HoeI'T’Ding , University of North Carolina. 

Several tests of independence have been proposed which are based on statistics depending 
only on the ranks of the sample values. Under tbe hypothesis Ho of independence the 
distribution of such statistics docs not depend on the form of the parent distribution. 
Two of those statistics, Spearman’s rank correlation coefficient and Linde)berg-Kendails 
statistic based on the number of inversions in the permutation of the ranks, are s w t 
bo asymptotically normally distributed in samples from any population (the limiting o - 
SaffibuttS being singular in certain degenerate cases). The asymptotic distnbu ion 
r Co ocff cicnts reveals that the corresponding tests of independence are inconsistent 

<L ?ZZ tjth, probability of r.i.oto, H. d«,». —■ 

true), and at least one of them is biased m the hmiU K ‘ ■ P . 

sample sizes and some sizes of the critical region there do not ■ - ■■ _ _ , . ... 

nondence based on ranks. But there do exist rank tests of m .. 

(detent, and hence unbiased in the limit. Examples of such tests are given. ■ 

4. Some Significance Tests for the Mean Using the Sample Range and Midrange. 
John E, Walsh, Princeton University. 
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Consider a Bample of size n, (2 < n < 10), diawn from a normal population with moan*», 
Lot x n bo the largest value and xi tlie amallest value of the sample Significance teats are 
dovoloped to compare p with a given hypothetical value ut by use of the sample, These 
significance toata are based on the quantity D — (4(x ( + Xn) — — *i) =■ [(sample 

midrange) — (hypothetical moan)[/(sample range). One-sided and symmetrical testa are 
considered. Values of D a such that Pr(D > D „ | p = p 0 ) =• a arc computed for a = ,05, .025, 
.01, ,008. Those values of D„ can be used to obtain one-sided tests at the .05, 025, ,01, .006 
significance levels and symmetrical Lcata at tiie 10, .05, .02, .01 significance levels. Kffi- 
cicncioB aro computed lor one-sided tests at the .05 and .01 significance levels. The effi¬ 
ciency is at least 90% for n <, 0 at flic .06 significance level and for n < 8 at the .01 level. 
The rango-midrange tost can be applied without computation through the use of an easily 
constructed graph. The application of a test requires only the plotting of tlio sample 
point (xi, x„) on this graph. 

6. Testing Compound Symmetry in a Normal Multivariate Distribution. David 

F. Vox aw, Jr., Princeton University. 

Lot FQC) be the d.f. of a i-order vector variato X(t ;> 3). Suppose the components of X 
aro divided into mutually exclusive and exhaustive subsets. F(X) is said to bo compound 
symmetric, for the given division of its variates into subsoLs, if it ib invariant over all por- 
mutatione of its variates within these subsets. F(X) is completely symmetric if the invari¬ 
ance holds over all permutations of its variates, ir F(X) is normal and compound sym¬ 
metric, then within eaoh subset of variates the moans aro equal, the variances are oqual 
and the covariances are equal, and between any two subsets of variates the covariances 
aro oqual. Testing hypotheses of compound or complete symmetry in a normal F(X) 
is of interest, for example, in studying psychological examinations and in medical research. 

In this paper likelihood ratio criteria are developed for testing various hypotheses 
involving compound symmetry in regard to a normal distribution and to k normal dis¬ 
tributions (fc )> 2), Given that the corresponding null hypothesis is true, the moments 
of eaoh criterion are obtained explicitly and the distribution of each criterion is identified 
as the product of independent beta variates (in the case of a single normal distribution, 
the distributions are given explicitly for t *= 3,4, and 5 for certain divisions of the variates 
into subsets). In a previous paper Wilks has given results on a very thorough study of 
the sampling theory of likelihood ratio criteria for variouB hypotheses involving complete 
aymmotry in regard to a normal distribution. 

6. Effects of Non-Normality at High Significance Levels. Harold Hotell¬ 
ing, University of North Carolina. 

The effects of non-normality in the underlying population on the probabilities of sig¬ 
nificance by customary statistical tests aro not well understood, in spite of numerous 
attacks, both mathematical and experimental, on tho problem. Chung’s recent proof that 
the distribution of the Student ratio ( has in eamplos from an arbitrary population a dis¬ 
tribution approaching normality for large samples tonds to confirm the common idea that 
non-normality makes little difforenoe if only tho sample is fairly largo, but this holds 
only for a fixed range of values of t while tho sample number JV increases. The tail areas 
beyond a deviation which increases with N in certain ways often behavo quite differently 
than in sampling from a normal population, If p is the probability that 1 1 | > t„ in sam¬ 
ples of N from a normal population and p' is the corresponding probability for another 

population, it is shown that lim j may be zero or infinite or may take any 

finite value, even when the non-normal distribution involved is of simple and realistic 
continuous forms. The conditions that this limit be unity are concerned only with the 
shoulders of the population histogram, and have nothing to do with its moments or its 
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behavior at infinity or at its mean. Tlria suggests that caution should be used in applying 
familiar tests with high significance levels; that further calculations should be directed 
toward making this caution quantitatively definite; and that the use of sample moments, 
or cumulunts cannot lead to the most appropriate criterion of non-normality for this 
purpose. 


7, On the Problem of Similar Regions. E. L Lehmann, University of Cali¬ 
fornia, Berkeley, and Henry ScHEEpi, University of California, Los Angeles. 

If X «* (Xi , • • , X n ) is a set of random variables with a joint probability density 
depending ou a set of parameters 0 = (0i, , 0,„), and if T = (T L , , T m ) is a set of 

sufficient statistics for 6, then Ncyman (Phil Trans Roy Soc London, Vol. 236 (1937), 
pp 333-380) has proved that a region w m the space of X is similar with respect to 0 if it 
lias the following structure The intersections w(J) of w with the surfaces T = t have the 
property that the conditional probability of the sample point X falling into w given that 
T = i docs not depend on l. In the present paper a necessary and sufficient condition ib 
found for llio regions with the above structure to be the only similar regions This con¬ 
dition is shown to be satisfied for a certain class K of probability densities which contains 
as special cases all densities for which the totality of snmlai regions has been previously 
determined. In particular the partial differential equations which Neyman (Annals of 
Math. Slat., Vol. 12 (1941), pp. 4G-7G) assumed were satisfied in his solution of the problem 
of similar regions are solved and it is shown that any density satisfying these equations 
belongs to the above class K 


8. Fourth Degree Exponential Function. L. A. Aroian and Marguerite 
Daiucow, Hunter College. 

It is shown that the fourth degree exponential function is supported by the Bernoulli 
probabdily Function and the hyporgoomelnc probability function as well as being the 
function for which the method of momonts is tlio best method according to the criterion of 
maximum likelihood, In the general situation six moments, at most, are needed The 
function is classified into two general groups depending on symmetry or asymmetry and 
each case is divided again into ummodal and bimodal distributions. Examples show that 
the function is very successful in graduating the mam Pearson types and the Gram-Charher 
Typo A frequency function Various generalizations of the exponential function are 
indicated. In addition to its wide generality, the greatest practical advantage of the new 
system is the simplicity of the numerical calculations 


9. A General Weak Limit Theorem for Independent Distributions. 

University of North Carolina. (Read by title.) 


P. L. Hsu, 


For every positivo integer n let thoro be n distribution functions *’->»(*)• 

F„n(.x). Assume that Maxi^,g n |l - F,.,(*) + F ni (-x)) - 0. Let F(x) 


mil + 


r 


[e 


_ 


bo the convolution F„i(x)*F n i(x)* ••• *F„„(x). Let <p{l) 

1 -itx/(\ + *»)) (1 + a*)/* 1 dG (*), with 0(35)1 andG(°°) - G(-») < «■ Let F{x) be the 
(infinitoly divisible) distribution law having oxp 4>{t) as its characteristic function. 

In order to have lim^ILto - F(x) at every continuity P oint oJ F ^> 1 * “ ’*XTtv 
and sufficient that the following relations hold at every x>0 such that ±S are continuity 

points of 0(y)'. 


(I) lim„-«e 


£ f dF nl (y) - [ (d + J/W) m, 

,-i J ud>* J ivi>» 
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(II) E jf y> dF„,M -( [ y dF^y) Y \ - f (1 + y») („), 

|v|>* V |vl<* / J J 1 w| <!= 

lim.-w E f y dF nl (y) =* m + [ y dG(y) — [ (1/y) dQ{y). 

/ml V |j/| <SC J 11/1 <a J |l/|<x 


(III) 


10. On the Maximum Partial Sums of Sequences of Independent Random 
Variables. I£. L. Chung, Princeton University. 

Tlio asymptotic behavior of the maximum partial sums of a sequence of independent 
random variables is studied in this paper. Two groups of new limit theorems aie estab¬ 
lished under general conditions. The first group deals with theorems of the weak typo. 
The limiting distribution of the maximum partial sums is obtained with an estimate of 
the remainder, thus improving a recent result of Erdos and Kac. Another estimate is 
obtained for a different domain of variation, which plays an essential lolo in the Bequol. 
These results correspond to the sharper forms of the central limit theorem In the second 
group, theorems of the s Irony type are obtained, giving prceiso lower boundB (in the sense 
of probability) for the maximum partial sums. These results form tlio exact counterpart 
to the general form of the law of the itciated logarithm, due to Feller, which give the pre¬ 
cise upper bounds. A summary of the main results and methods has appeared in Proc. 
Nal. Acad, of Sci., Vol. 33 (11)47), pp 132-130. 


11. Some Results on the Distribution of Quadratic Forms From Gaussian 
Stochastic Processes. (Preliminary report). Herman Rubin, Ctiwles 
Commission. 

If one considers the eatimation of tlio parameters of a Gaussian stochastic process 
whose elements are continuous functions from the functional values over a finite interval, 
one often finds that corlain parameters can be estimated exactly, and certain ptuamotors 
can not. This result often depends on tho distribution of quadratic functionals whoso 
arguments are elements of tho stochastic process under consideration. In this paper, it 
is shown that the olomonts of a certain class of quadratic fuuclionalB lmvo distributions 
concentrated at a point, and that tho elements of a different class do not; in this latter case, 
tho characteristic function is computed. 


12. Some Significance Tests for the Median which are Valid under Very General 

Conditions. (Preliminary Report) John E. Walsh, Princeton University. 

(Read by title.) 

Consider n independent values drawn from populations necessarily satisfying only: 1) 
Eaoh population has a unique median, 2) Tho median has the same value <p for oacli popu¬ 
lation. 3) Each population is symmetrical. 4) Each population is continuous. (It 
is to bo emphasized that no two of tho values are necessarily drawn from tlio saino popula¬ 
tion.) Significance tests are derived for </> on tho basis of l)-4). These significance tests 
are based on order statistics of certain combinations of ordor statistics, each combination 
being either a single ordor statistic of the n values or one-half the sum of two ordor statistics. 
Tho tests aro invariant under permutation of tho n values and reasonably efficient if the 
yalucB represent a sample from a normal population. The significance levels are of tho 
form ?•/2", (r. = 1, • • ■ , 2" — 1) Each value of r can bo obtoined for some one-sided signifi¬ 
cance test. Thus any significance level can be cloBcly approximated if n is large. The 
major disadvantage of these tests is the limited number of suitable significance levels avail¬ 
able for small values of n. This disadvantage is partially eliminated by the development of 
testa which have a specified significance level if the values are a sample from a normal 
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population and a significance level bounded near this specified value if only l)-4) necessarily 
hold Results based on l)-4) are applied to several well known statistical problems' 
Tests arc obtained for the mean on the basis of a large number of independent values from 
populations having the mean but little else m common Also generalized results are ob- 
tainod for the Behrens-Fisher problem, quality control, slippage tests, the sign test and 
eases where some of the n values are dependent 

13. Loss of Information in (-tests with Unbalanced Samples. (Preliminary 
Report) John E. Walsh, Princeton University. (Read, by title) 

Consider two normal populations, the first with mean oj and variance o-J, the second with 
mean oj and variance <r\, while <ri/v 2 has a known value 0 If the hypothesis Hi = ai is to 
be tested by a 1-test (one-sided or symmetrical) using ru sample values from the first popu¬ 
lation and rii values from the second population (m T Tii — n, fixed), it is shown that this 
experiment is most powerful when rii/ni = m/tri (integer considerations neglected) The 
i-tests satisfying this condition will be refened to as balanced t-tests. Thus information 
will be lost by not using a balanced experiment A quantitative measure of the information 
lost by using given values of n t and ni is determined by the total sample size m, (mi + m 2 = 
m), of the balanced 1-test (same significance level) which has approximately the same power. 
Then, n — m sample values are wasted by using (»i , ni) rather than (mi, m 2 ), l e. only 
100m/n% of the information obtainable per observation is used by {th , n t ). A sym¬ 
metrical i-test with significance level 2 a has the same value of m as a one-sided l-test with 
significance level a. For one-sided i-tests with significance level a • m = i(B + y/fi 2 - 8A), 
whore B - 2 + A + IC/2, A = (C 4- 1) 2 [1 - K\ /2(n - 2)HCV»i + 1/m] -1 , and K a is the 
standardized normal deviate exceeded with probability a. This approximation to m is 
valid for m ^ 5 if a = 05, m ^ 6 if a = 025, m > 7 if a = 01, m > 8 if a = 005 (A 
fractional value of m represents an interpolated measure of the sample size of the equivalent, 
balanced experiment,) 

14. Some Theorems on the Bemoullian Multiplicative Process. T E. Harris, 
Princeton University. (Read by title) 

A Bingle entity may have j descenders with probability P, , (j = 0, 1, 2, ■ ■ ■), Each 
first generation entity has then the same procreative probabilities, etc. Let 

f{s) = po + pis + • 

If z„ is the number of entitios in the nth generation, it is known that P(z„ “ j) is given by 
the coefficient of s< in the nth iterate/!/ ■ ■ (/)] = /n(s) Let Ez i = x, 1 < x < « . Con¬ 
ditions are given insuring that as n —* 00 the cumulative distribution of the variate z n /x n 
approaches a limit-function which is absolutely continuous except for a possible single 
jump. Let g(u) be the corresponding frequency function If /(s) is a polynomial of degree 
A, let g «= logaA/Clog® fc — 1). Otherwise, q = 1. Thenff(u)-exp(u« + <) [ ls ,isnot] summable 
(0, «) according as e is [nogativo, positive] Behavior of g(u) near u = 0 is also considered. 
Speoial oases are considered wkerey(u) = constant a positive integer. Max- 

mum likelihood estimates for the parameters p« ,pi , ■ ■ , and x are obtained as functions 
of n successive values Zi, z* • Consistency, in a certain sense, is proved. A 

specialized mothod is given for finding the moment-generating function of the variate N, 
the smallest value of n such that z„ = 0 



NEWS AND NOTICES 


Headers arc invited to submit lo the Secielary of the Institute news items of interest 

Personal Items 

Dr. George R Albert has been appointed to an associate professorship at the 
University of Tennessee. 

Dr T. W. Anderson, Jr. has been promoted to an assistant professorship in 
the Department of Mathematical Statistics at Columbia University. He is on 
leave the first half of the 1947—18 academic year at the Institute of Actuarial 
Mathematics and Mathematical Statistics, (Stockholm University as a Guggen¬ 
heim hollow During the second half of the academic year he will be at Cam¬ 
bridge University. 

Associate Professor Max Astrachan has been promoted to a full professorship 
at Antioch College, Yellow Springs, Ohio. 

Associate Professor T. A Bancroft, who has been at the University of Georgia, 
Athens, Georgia, is now with the Statistical Laboratory, Alabama Polytechnic 
Institute, Auburn, Alabama. 

Dr. M. S. Bartlett of Cambridge University has been appointed as Professor 
of Mathematical Statistics at the University of Manchester, Manchester, 
England. The position is a newly created one. Professor Bartlett indicates 
that tliia position is believed to be the first official professorship in mathematical 
statistics in England. 

Professor M. A. Brumbaugh has accepted a position with the Bristol Labora¬ 
tories Inc., Syracuse 1, New York. 

Dr. Donald A. Darling has been appointed Research Associate at Cornell 
University. 

Professor D. B, DeLury of the Virginia Polytechnic Institute has accepted a 
position with the Ontario Research Foundation, 43 Queens Park, Toronto 5, 
Canada. 

Professor Abel Gauthier of the University of Montreal has been appointed 
Head of the Institute of Mathematics and Assistant-Secretary of the Faculty of 
Science at that institution. 

Dr. Casper Goffman, former assistant professor in the Mathematics Depart¬ 
ment, University of Kentucky, is now in the Mathematics Department, Univer¬ 
sity of Oklahoma, Norman, Oklahoma, 

Mr, Philip Hardy has returned to the General Eloolric Company at Warren, 
Ohio after serving at Wright Field. 

Dr, Carl F. Kossack, who has been with the Navy Department in Washington, 
D. C, as an Air Intelligence Specialist, has accepted an associate professorship in 
the Department of Mathematics at Purdue University. 

Mr. Frank Jones Massey, Jr. is now teaching in the Department of Mathe¬ 
matics, University of Maryland, College Park, Maryland. 
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Dr. William Burton Michael, who has been Lecturer in Mathematics Psv 
chology and Itducatronal Psychology at the Umycuuty ot South“ 
hue now accepted an assistant professorship in the Department of PsydrZv 
Princeton Lmveisity lie is also a member of the Research Department’ 
College hnLrancc Examination. Board at Princeton. P ’ 

Mr. Bernard Ostle a former teaching assistant, School of Business Adminis- 

Mmi f sota > 18 now at Iowa State College, Ames, Iowa. 

Mr, Maurice II Quenouille, who was formerly with the Rothamsted Experi¬ 
mental Station, Ilarpendon, Herts, England, has accepted the position of 
Lecturer in Statistics, Manschal College, University of Aberdeen, Scotland. 

Dr. James A. Rafferty left the Department of Pathology, University of 
Rochester m June and has been appointed Chief of the Department of Statistics 
Air University, School of Aviation Medicine, Randolph Field, Texas. 

Miss Mary Ann Savas has accepted a position with General Motors Detroit 
Michigan. ’ ’ 

I lofessor Geoige J. Stigler, formerly with Brown University, is now teaching 
in the Department of Economics, Columbia University, Nqw York, New York. 

1 lofessor E. L. Welker has resigned an associate professorship m mathematics 
at the University of Illinois to become Associate in Mathematics m the Bureau of 
'Medical Economic Research of the American Medical Association, 

Mr. Sol M. Wezelman, who completed his master’s degree in actuarial science 
at the University of Michigan in June, has accepted a position as Assistant 
Actuary in the North Dakota State Department of Insurance, Bismarck. 

Dr. Bertram Yood has received his doctorate at Yale and is now on the staff 
at Cornell University. 

Mr. Earl K. Yost, Jr. has accepted a position with the General Electric Co. at 
the Hanford Engineering Project, Richland, Washington 

Professor James G. Smith, of Princeton University, died at Princeton on 
November 28, 1946. 


Beginning with the October issue, the quarterly journal Mathematical Tables 
and Other Aids to Computation will publish a new feature section, “Automatic 
Computing Machinery,” designed to disseminate information and news on 
research and development in the field of high-speed automatic calculating 
machinery. Material should fall under the general headings of Bibliography, 
Technical Developments, Discussion (including correspondence), and News. 
Contributions to this section are invited and Bhould be addressed to Dr. E. W. 
Cannon, Head of the Mathematics Group, Machine Development Laboratory, 
National Bureau of Standards, Washington, D. C. 


Institute of Numerical Analysis Established 
Plans have been completed for the establishment of one of the newest units of 
the National Bureau of Standards—the Institute of Numerical Analysis at the 
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University of California at Los Angelos, according to an announcement by Dr. 
Edward U. Condon, Director of the Bureau. 

One of the giant high-speed electronic computing machines, now under devel¬ 
opment by the Bureau of Standards, will be installed at the Institute when 
completed. Design specifications call for high memory capacity and auto¬ 
matically sequenced mathematical operations from start to finish at speeds 
attainable only with electronic eejuipment. 

The Institute has two primary functions, The first is research in applied 
mathematics aimed at developing methods of analysis which will extend the use 
of the high-speed electronic computers. The second is to act as a service group 
for Western industries, research institutions, and government agencies. The 
service function will include not only the use of the machines for problem solving 
but also assistance in the formulation of problems in applied mathematics of the 
more complex and novel types. Service operations are to be initiated immedi- 
ately, using the latest types 0 f commercially available computing equipment. 

The decision to locate the Institute at the. University of California at Los 
Angeles was made after a nalion-wide survey by the National Bureau of Stand¬ 
ards. Conters in the East and Middle West were considered as well as the Far 
West, but Los Angeles, it was decided, offered the widest range of possibilities 
for an Institute of Numerical Analysis. Concentration of aircraft industries and 
the presence of several major scientific institutions were critical in the choice of 
Los Angeles, 


Election of Fellows 

'The Board of Diroclo'ra announced at the Yale Meeting the election of the 
following members as Fellows of the Institute: Theodore W. Anderson, Jr., 
Alexander CL Aitkon, David II. Blackwell, Georges Darmois, Ragnar Frisch, 
Robert C. Geary, Frederick Mostoller, Gerhard Tintner, Charles P. Winsor and 
John Wishart. 


New Members 


The following portions have been elected to membership in the Institute 
(,June 1 to August 29, 1947) 

Baldwin, Helen Mildred, B.S. (Cornell) Research Associate in HUtlistics, Atomic Energy 
Project, 2IS Avenue C, Rochester 6 , N . Y, 

Blank, Paul M„ A.B. Touching asst and grad, student, Univ. of Calif,, Box 6$@, Fair 
Oaks, Calif, 

Bowden, George Edwin, B,8. (Duka) Teaching aHst., Math. Dept,, White Hall, Cornell 
Univ., Ithaca, N. Y. 

Bradley, Ralph Allan, M.A (Queen’s Univ.) Grad, student, Univ. North Carolina, Well¬ 
ington, Ontario, Canada. 

Burton, Kenneth John, Hoad of Statistics Section, British Employers’ Confederation, 16 
Rutherwyke Close, Ewell, Surrey, England. 

Carlson, Phillip G., Jr., A.M. (Columbia) 148 Cornell Street, Itoslindale SI, Moss. 



news and notices 


615 


Carol, Bernard, M S E 
Street , N Y. 


(Columbia) 


Graduate student at Columbia Umv., 15 Went 96 th 

C, ' ck, m.°71;wl“X S ™ 18t “" ,1 " n ' fMk *-* 

si's: nZliTsT" at “- 1 “™- 

- »*-. 

Diver, M. L., M.E. (Purdue) Consulting Engincei, P.0 Box 1016, San Antonio 6 Texan 

Erasmus, Jos.as C M 8 1C. (Umv of Stcllenbaach, South Afnca) R«h’Officer! 
Gioolfonlom College of Agncultuie, Middelburg, C-P, South Afnca. 

Gottlieb, Morris J , Ph D (Wash Umv , St. Louis) Member of the Institute for Advanced 
Study, Washington University, St Louis, Mo. 

Greenwood, Joseph Arthur, A B. (Harvard) Student at Harvard University 68 Oxford 
SI., Cambridge 88, Mass ’ 1 

Gysbers, Jack C., M.A. (Uuiv. of Calif ) Teaching asst., Dept. ofMath.,Univ of Calif 
MSI) Berkeley Way, Berkeley 4, Calif. ’ 

Haskind, Mina, H.H. (Brooklyn College) Student at Brooklyn College, 768 Eastern Park¬ 
way, Brooklyn IS, New York. 

Hauser, Dr. Philip M„ Ph.D (Umv of Chicago) University of Chicago, Chicago 37, Ill, 

Hoyt, Cyril J., Ph.D. (Umv. of Minu.) Research Associate, Dept, of Education, Univer¬ 
sity of Chicago, Chicago, Ill 

Kern, Enrique Roberto, First Assistant, Institute, of Biometry, Uuiv of Buenos AireB 
Rivadavia 8854, Buenos Anos, Argentina 

Mark, Abraham M., PhD. (Cornell) Mathematics Department, Umv of Wisconsin, 
Madison, Wisconsin. 

Moss, George GII, B.A. (St. John’s College, Annapolis) Actuarial Statistician, Metro¬ 
politan Life, #77/ Morris Ave , N Y. 58, N Y. 

Phillips, Bernard E,, A.M. (Columbia) Box Ufl, Caihedral Station, New York SB, N. Y. 

Radvanyl, Laszlo, Ph D. (Univ. of Hcidelbeig) Professor of Economics, National Umv. of 
Mexico, Donato Guerra 1, desp. 207, Mexico , D F 

Richardson, John M,, Ph.D. (Cornell) Member of Technical Staff, Bell Telephone Lab¬ 
oratories, Inc,, Murray Hill, New Jeiscy 

Royston, Robert W., M.S, (Univ. of Mich.) Asst. Prof , Math. Dept., Wash. & Lee Umv , 
117 W. Washington St., Lexington, Virginia 

Saves, Mary A., A.B. (Umv of Mich ) Student at Umv. of Mich., 684 E Second St, Mon¬ 
roe, Mich. 

Shepard, David H., AB (Univ of Mich ) Research Analyst, Army Secunty Agency, 
BOB Randolph Street, Falls Chut eh, Virginia. 

Throdahl, Monte C., B.S (Iowa State College) Research Chemist in Charge of Rubber 
Lab., Monsanto Chemical Co., Nitro, West Virginia 

Uchytll, Jan, Doctor of Science, Chief of Production Control Dep. m Central Federation 
of Czech. Industry, Praha II, Prikopy 14, Czech. 

Vergara, Jose, Doctor of Engine ■ ■ s, 'V- ’ P P r isor of Economics, Madrid, Chief 

5262 S Blackstone Ave., Chicago 


of tho Bureau of Statistics, P. p. <■ 'li’< . M- *i 
87, Illinois. 

Wei, Dzung-shu, Ph.D (Umv of Ton a) Prof and Head of Math Dept., St. John’s Univ., 
Shanghai, 129 East 10th Si , .Vi w 1 o.k S, A I 
Wolfson, Jacob, B.A (New York College) Statistician, Social Secunty Adm , 81,5 Bruns¬ 
wick Road, Essex, Maryland. 



REPORT ON THE NEW HAVEN MEETING OF THE INSTITUTE 

The Tenth Summer Meeting of the Institute of Mathematical Statistics was 
held at Yale University, New Haven, Connecticut, Tuesday, .September 2 
through Thursday, September 4, H)47. The meeting was held in conjunction 
with the summer meetings of the American Mathematical Society and the 
Mathematical Association of America. The following ISO members of the 
Institute attended the meeting: 

0 H Alleiidocrfer, R. L. Anderson, II. K. Arnold, I.. A. Arman, II . M. Huron, J. I,, Barnes, 
W. I). Hult'ii, It. E. Ikcliliofcr, A A. Hcnnclt, Joscpli Herksoii, I). II, Blackwell, 0.1. Bliss, 
Coliu IJlytli, Jr., A. E. Rrandt, G, M. Brown, 11. II, Mrown, 0. 1’. Bruno, P. T Rruyere, 
Mrs, P. T. liruyerr, J II Bushev, B. II. Camp, (I. C Campbell. Pi Lam Cluuiri, K. B. Chung, 
W. Cl. C’ciehmn, A. 0. Cohen, Jr., K P Coleman, T. F. Cupe, Cl. M. Co\, C. ('. Craig, E, L, 
Grow, H. B Curry, Cl. B. Danlzig, M. I). Dari,mv, B. B. Dav, Itermird Ummctile, C. E. 
Dieulefail, C.W. Dennett, Churchill Emeu hurt, 1„ U, Elvelmek, M.W. Kinley, II, I 1 , Evans, 
William lYller, C. 1). Ferris,M. M. Flood, It. M. Foster, II. A. Freeman, J. K. Freund, II, P. 
Goiringor, M. J Gottlieb, J. Arthur Greenwood, Evelyn CirtKumiun, F. E. Grubbs, II. T. 
Guard, P. It. Ilalmos, Max Ilaljwrin, M. II. Hansen, B, I. Hart, .Mina Haskitid, Wassily 
IIoefTding, It. II. Hoskins, Harold Hotelling, A, B, Householder, Jaroalav Janko, Irving 
ICaplansky, 1/so Katz, Oscar KempLhome, E. M, Kennedy, \V. I.. Kielihne, C. J, Kirrhon, 
L. F. Knudsen, II. 8. Koiiijn, C. F. Kossaek, Jack Didern an, II. Cl. Landau, E. I,. Btdimaun, 
R. A builder. Waller IrngliUm, Jr., I<\ t! Dame, Joseph Lev, Howard Imveue, Julius I.eib- 
loin, Arthur Liudnr, 8. B. Utlauer, H. 1). Dowry, II. F. MaeXeisli, 1’. J, McCarthy, Jolm 
Mandel, II. B. Mann, Sophie Marcuse, F. J, Musacy, Margaret Merrell, K. B. Mode, M, E. 
Moore, FrederickMoslulior, D. N. Narnia, P, M. Nourath, M. G. Neurdimburg, G. E. Noe¬ 
ther, M. L. Nordon, II. W. Norton, I». 8. Olmstaad, A. L. O'Toole, E. It. Otl, T. K. Ox toby, 
Edward Paulson, M. P, PoiaakoH, G. B. Price, J. A. Rafferty, I«, J, lteed, 0. J. Rees, P. 11, 
Rider, John Iiiordan, II. E. Robbins, Milton da Silva Rodriguea, A. G. Itosander, Ernest 
Rubin, Herman Rubin, Frank Saidol, M, M. Sandorairc, Arthur Bard, Max Sasuly, F, E, 
Sattorthwaito, E. D. Schell, Jack Sherman, Rosodilh Sitgreavos, Andrew Sobezyk, Milton 
Sobel, Herbert Solomon, Mortimer Spiegolman, Ariliur Stein, Ilenry Teicher, It. M. Thrall, 
Gerhard Tintner, M N. Torrey, J. W. Tukoy, D. F. Volaw, Jr,, Abraham Wald, II. M. 
Walker, J. E. Walsh, R. M. Walter, J. II, Watkins, Dzung-Bhu Wei, IS. S. Weiss, S, 8, Wilks, 
0. P. Winsor, II. 0. Wold, Jacob Wolfowilz, C. A. Wright, Bertram Yood, 

The Tuesday afternoon session was devoted to a symposium on 2 x 2 tables 
with Professor Lowell J. Reed of Johns Hopkins University serving as chairman. 
Addresses were given on Tests of Significance by Dr. Churchill Eisenhart, Na¬ 
tional Bureau of Standards; Estimation by Dr. Charles P. Winsor, Johns Hopkins 
University and Non-Standard Cases by Dr. Joseph Berkson, Mayo Clinic. 
Disoussants were Mr. William F. Taylor, Dr. Frederick Hosteller, Professor 
David II, Blackwell and Professor John W. Tukey. The attendance was 
approximately 130. 

The first Wednesday morning session was deyoted to contributed papers. 
Professor John W, Tukey of Princeton University presided. The attendance 
was approximately 85. The following three papers were presented: 

1, Estimation of Parameters in Truncated Pearson Frequency Distributions . 

Professor A G Cohen, University of Georgia, 
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2. Distribution of a Root of a Detcrminanlal Equation 
Mr. 1). N. Nanda, Umveisity of North Carolina. 

3 The Rower of Certain Non-Pai ameinc Tests of Independence 
Dr. Wassily Iloeffding, University of North Carolina 

The second Wednesday morning session was held with Professor Will Feller, 
President of the Institute, presiding. Professor R. A. Fisher, University of 
Cambridge, gave the address under the title The Fitting of Gene Frequencies to 
Data for Genotypes. The attendance was approximately 160 

The membership business meeting of the Institute was held at 9:16, Thursday 
morning, in 102 Chittenden Hall with President Feller presiding The attend¬ 
ance was approximately 55 It was voted to make certain changes in the 
By-Laws and in particular to raise the due to $7 00 per year. (An exception is 
made for those living outside the Western Hemisphere.) Morris Hansen, 
Chairman of the Committee on Planning and Development, initiated a lively 
discussion with reference to desirable changes in the Constitution 

On Thursday morning at 10:30, with President Feller presiding, Professor A. 
Wald of Columbia University presented the Henry Lewis Rietz Lecture on 
Sequential Estimation and Multi-Decisions. The attendance was approximately 
150. 

A joint session with the American Mathematical Society was held early 
Thursday afternoon at which Professor S. S Wilks of Princeton University gave 
a lecture on Sampling Theory of Order Statistics. Professor Harold Hotelling of 
the University of North Carolina was the presiding officer. The attendance was 
approximately 300. 

This session was followed by another joint session with the American Mathe¬ 
matical Society which was devoted to contributed papers. Professor John W. 
Tukey presided at this session and the attendance was approximately 115. The 
following seven papers were presented: 


1. Some Significance Tests for the Mean Using the Sample Range and Midrange. 

Mr. John Walsh, Princeton University 

2. Testing Compound Symmetry in a Normal Multivariate Distribution. 

Dr. David F. Votaw, Jr., Princeton University 

3 F t. .. ■ 1 Vi,’ -,Y •, I'igh Significance Levels Professor Harold Hotelling, 


L.:ii e'n. 1 .' o. N>, 

4 0> i‘i R iJ’iV’- 


I)i Lii.' i I.. I ■ 

i ,T i‘ 1 iii , \,i-’i 

6. The Fourth Degr 


i ( !.'■ 


a-.r. 1 1 i' ersity of California, Beikeley and Professor Henry 
■ i ( i, ! iima at Los Angeles, 
j, „„ ...„ u ree Exponential Function. 

Dr. Loo A. Aroian and Professor Marguerite Darkow, Hunter College 
6. On the Maximum Partial Swms of Sequences of Independent Distributions. 


Dr. IC. L. Chung, Princeton University 

7. Some Retails on the Distribution of Quadratic Forms from Gaussian Stochastic 
Processes. 

Mr. Herman Rubin, Cowles Commission, 

The following four papers were presented by title 

8. A General Weak Limit Theorem for Independent Distributions 
Professor P. L Hsu, University of North Carolina 
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9. Some Significance Tests for the Median which arc Valid under Very General Condi¬ 
tions (Preliminary Report,). 

Mr. John E. Walsh, Princeton University. 

10. Loss of Information in t-lcsts with Unbalanced Samples (Preliminary Report), 
Mr, John. 15 Walsh, Princeton University. 

11. Some Theorems on the Ilernoullian Multiplicative Process 
Mr. T. E, Harris, Princeton University. 

Abstracts of all these papers appear elsewhere in this issue of the Annals. 

A beer party in honor of the foreign statisticians attending the meeting was 
held in the dining room of Saybrook College on Wednesday evening. A joint 
dinner with the American Mathematical Society and the Mathematical Associ¬ 
ation of America was held on Thursday evening. 

C. C. Craig, 

Acting Secretary. 
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